my-vegetable-has-exploded commented on issue #8503: URL: https://github.com/apache/arrow-datafusion/issues/8503#issuecomment-1868475220
I read related pr about parquet and csv. Parquet parallel scan is based on rowgroup and csv is based on line. Both of them can be splitted by row and then output RecordBatchs using a certain method. I don't think arrow can be handled like that, since arrow file is purely column-based. But I am wondering whether we can split the scan process into several parts and rebuild the whole Batch, since there maybe more than one array in file.  Merry Christmas! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
