zhuqi-lucas commented on PR #19924:
URL: https://github.com/apache/datafusion/pull/19924#issuecomment-3788950330

   > Thank you @zhuqi-lucas -- I think this code looks good and well tested
   > 
   > I think we should reconsider:
   > 
   > 1. The name of the option
   > 2. Performance
   > 
   > My reading of the code is that this PR will parse an input JSON array 3 
times (once for schema inference, one to convert to NDJSON, and then once 
during the actual parsing)
   > 
   > I personally recommend we look into avoiding this overhead
   > 
   > You might be able to re-use the same approach as the arrow-rs parser if 
you can figure out how to trim the input to remove the first `[` and last `]` 
rather than having an entirely different code path
   > 
   > It might be better to add such skipping directly in the arrow reader
   > 
   > That all being said, I think it would be ok to merge this code in as is 
and file a ticket to improve it as a follow on
   
   Thank you @alamb  for review, good suggestion, i will address comments and 
redesign this PR to make the performance better, also optimize the name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to