vertexclique commented on pull request #8430: URL: https://github.com/apache/arrow/pull/8430#issuecomment-707341797
@nevi-me > I didn't do a detailed review, but I'm happy with the changes. It's been a while since I looked at the JSON reader, how hard/easy do you think it would be for us to support the outstanding work on https://issues.apache.org/jira/browse/ARROW-4534? Thanks! Especially on the nested reading part (https://issues.apache.org/jira/browse/ARROW-4544), it would be nice to reuse builders at entry. Having a recursive reader with a `recursion_limit` set would be good to go. If we go down into the iterative approach, we will explicitly generate a macro to expand on the compile-time with a depth embedded in. That might slow down to compile times and create larger binaries. The good part of the recursive approach is that it will be limited by the stack size (but there might be growing stack implementation), where the user can increase this by hand. The bad part is that the recursion limit we have defined shouldn't hit to default stack size, and we should have a sweet spot for it. About the other ticket that is still open (https://issues.apache.org/jira/browse/ARROW-4803). Type inference for schema might be hard at first. Although it is hard, we can do assumption based parsing by parsing that first(or +2) record's data to infer the type. But when the type is given, we can try to parse all down in iso format. > I also have the feeling that the reader might be slower than other readers. What has been your experience @vertexclique? I didn't test the performance, since I was using this in tests, I needed it. We can create a benchmark for r/w, maybe? wdyt? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
