eldenmoon opened a new issue, #11663:
URL: https://github.com/apache/doris/issues/11663

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   provide simdjson to parse json document in json scanner
   
   ### Solution
   
   Currently we use rapidjson to parse json document, It's fast but not fast 
enough compare to 
[simdjson](https://github.com/simdjson/simdjson/blob/master/doc/basics.md).And 
I found that the simdjson has a parsing front-end called simdjson::ondemand 
which will parse json when accessing fields and could strip the field token 
from the original document, using this feature we could reduce the cost of 
string copy(eg. we convert everthing to a string literal in 
_write_data_to_column by `sprintf`, I saw a hotspot from the flamegrame in this 
function, using simdjson::to_json_string will strip the token(a string piece) 
which is std::string_view and this is exactly we need).And second  in 
`_set_column_value` we could iterate through the json document by ` for (auto 
field: object_val) {xxx}`, this is much faster than looking up a field by it's 
field  name like `objectValue.FindMember("k1")`.The third optimization is the 
`at_pointer` interface[ simdjson provided](https://github.com/simdjson/sim
 djson/blob/master/doc/basics.md#json-pointer), this could directly get the 
json field from original document.
   
   
   bellow is the performance result from my benchmark using stream load:
   
   
![image](https://user-images.githubusercontent.com/64513324/184056716-04c73ee4-60dc-490f-a427-b5599ab25c48.png)
   
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to