my-vegetable-has-exploded commented on issue #8503:
URL: 
https://github.com/apache/arrow-datafusion/issues/8503#issuecomment-1868475220

   I read related pr about parquet and csv.
   Parquet parallel scan is based on rowgroup and csv is based on line. Both of 
them can be splitted by row and then output RecordBatchs using  a certain 
method.
   I don't think arrow can be handled like that, since arrow file is purely 
column-based. 
   But I am wondering whether we can split the scan process into several parts 
and rebuild the whole Batch, since there maybe more than one array in file.
   
![图片](https://github.com/apache/arrow-datafusion/assets/48236141/8e7a8b19-f302-4678-96a8-d2d3af5f4c56)
   
   Merry Christmas!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to