mofeiatwork commented on issue #12459: URL: https://github.com/apache/arrow/issues/12459#issuecomment-1045610272
After reading the arrow document and code, I have implemented a basic row-based converter from arrow array to json, which could handle most primitive types and nested types including struct, list, and map. Of course it's less efficient than columnar style builder, which I will further improve it. I think the key is how to convert nested type to tree-schema JSON through a row-oriented JSON builder. Since most JSON builder implementations (like rapidjson, simdjson) are row-oriented, which build a JSON document one by one. A basic idea is, iterate nested data recursive, and build the JSON tree at the same time. The limitation is during the iteration of nested data, there's a lot of code branch which reduce the performance. So my idea is separate JSON building into two stages: - Schema building: build JSON tree structure according to arrow schema - Leaf filling: fill the JSON tree leaf node with arrow array In this way, most code branch could be eliminated, and the access of arrow array will be cache-friendly. How do you think about it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
