[GitHub] [arrow-datafusion] jorgecarleitao opened a new issue #486: Json inference of multiple files is brittle

GitBox Wed, 02 Jun 2021 22:17:17 -0700


jorgecarleitao opened a new issue #486:
URL: https://github.com/apache/arrow-datafusion/issues/486



   Currently, we run the inference on a per file basis, and limit the number of 
records over all files. This means that if the first file has 1000 entries and 
the second 1000, and we run the inference with a max of 1000 rows, the whole 
inference will be based on the first file alone.
   
   IMO we should distributed the rows at least evenly with the number of files 
we are inferring. In the case above, this would correspond to 500 lines for 
each file.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao opened a new issue #486: Json inference of multiple files is brittle

Reply via email to