eldenmoon opened a new issue, #11663: URL: https://github.com/apache/doris/issues/11663
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description provide simdjson to parse json document in json scanner ### Solution Currently we use rapidjson to parse json document, It's fast but not fast enough compare to [simdjson](https://github.com/simdjson/simdjson/blob/master/doc/basics.md).And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by `sprintf`, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in `_set_column_value` we could iterate through the json document by ` for (auto field: object_val) {xxx}`, this is much faster than looking up a field by it's field name like `objectValue.FindMember("k1")`.The third optimization is the `at_pointer` interface[ simdjson provided](https://github.com/simdjson/sim djson/blob/master/doc/basics.md#json-pointer), this could directly get the json field from original document. bellow is the performance result from my benchmark using stream load:  ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
