[I] Parallel NDSON reading [arrow-datafusion]

via GitHub Mon, 11 Dec 2023 13:55:40 -0800


alamb opened a new issue, #8502:
URL: https://github.com/apache/arrow-datafusion/issues/8502


   ### Is your feature request related to a problem or challenge?
   
   DataFusion can now automatically read CSV and parquet files in parallel (see 
https://github.com/apache/arrow-datafusion/issues/6325 for CSV)
   
   It would be great to do the same for "NDJSON" files -- namely files that 
have multiple JSON objects placed one after the other. 
   
   ### Describe the solution you'd like
   
   Basically implement what is described in 
https://github.com/apache/arrow-datafusion/issues/6325 for JSON -- and read a 
single large ND json file (new line delimited file) in parallel
   
   
   
   ### Describe alternatives you've considered
   
   Some research may be required -- I am not sure if finding record boundaries 
is feasible
   
   ### Additional context
   
   I found this while writing tests for 
https://github.com/apache/arrow-datafusion/issues/8451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Parallel NDSON reading [arrow-datafusion]

Reply via email to