zhuqi-lucas commented on PR #19924: URL: https://github.com/apache/datafusion/pull/19924#issuecomment-3788950330
> Thank you @zhuqi-lucas -- I think this code looks good and well tested > > I think we should reconsider: > > 1. The name of the option > 2. Performance > > My reading of the code is that this PR will parse an input JSON array 3 times (once for schema inference, one to convert to NDJSON, and then once during the actual parsing) > > I personally recommend we look into avoiding this overhead > > You might be able to re-use the same approach as the arrow-rs parser if you can figure out how to trim the input to remove the first `[` and last `]` rather than having an entirely different code path > > It might be better to add such skipping directly in the arrow reader > > That all being said, I think it would be ok to merge this code in as is and file a ticket to improve it as a follow on Thank you @alamb for review, good suggestion, i will address comments and redesign this PR to make the performance better, also optimize the name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
