Rafferty97 commented on PR #19924:
URL: https://github.com/apache/datafusion/pull/19924#issuecomment-3995648538

   > Thank you @zhuqi-lucas -- I think this code looks good and well tested
   > 
   > I think we should reconsider:
   > 
   > 1. The name of the option
   > 2. Performance
   > 
   > My reading of the code is that this PR will parse an input JSON array 3 
times (once for schema inference, one to convert to NDJSON, and then once 
during the actual parsing)
   > 
   > I personally recommend we look into avoiding this overhead
   > 
   > You might be able to re-use the same approach as the arrow-rs parser if 
you can figure out how to trim the input to remove the first `[` and last `]` 
rather than having an entirely different code path
   > 
   > It might be better to add such skipping directly in the arrow reader
   > 
   > That all being said, I think it would be ok to merge this code in as is 
and file a ticket to improve it as a follow on
   
   Hey @alamb, I've been working on a few PRs in `arrow-rs` that would allow us 
to use the `TapeDecoder` directly and obviate the need for a "JSON array to 
ND-JSON converter". Once those are merged, I will open a PR in Datafusion to do 
just that.
   
   If you're interested, those PRs are:
   * https://github.com/apache/arrow-rs/pull/9496
   * https://github.com/apache/arrow-rs/pull/9494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to