Re: [I] support `skip_rows` for `CsvFormat` [arrow-datafusion]

via GitHub Fri, 26 Jan 2024 14:58:54 -0800


Jefffrey commented on issue #8824:
URL: 
https://github.com/apache/arrow-datafusion/issues/8824#issuecomment-1912803444


   > I think the idea of skipping N rows on the file level doesn't make much 
sense. What we can probably do is to skip N rows on dataframe level, but again 
there is no guarantee which exactly 2 rows will be skipped because ordering, 
shuffling, etc. IMHO it looks more a user task than DataFusion task as the user 
has more context when executing the query
   
   If I understand the issue correctly, skipping rows at the DataFrame level 
would not work as the file wouldn't even be able to be parsed into a DataFrame 
in the first place, due to the initial rows being not CSV rows.
   
   Is it possible to specify to skip specific file lines, instead of the first 
N rows of a file, perhaps? Though I'm not sure if that might be any simpler 
than trying to skip the first N rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] support `skip_rows` for `CsvFormat` [arrow-datafusion]

Reply via email to