Jefffrey commented on issue #8824: URL: https://github.com/apache/arrow-datafusion/issues/8824#issuecomment-1912803444
> I think the idea of skipping N rows on the file level doesn't make much sense. What we can probably do is to skip N rows on dataframe level, but again there is no guarantee which exactly 2 rows will be skipped because ordering, shuffling, etc. IMHO it looks more a user task than DataFusion task as the user has more context when executing the query If I understand the issue correctly, skipping rows at the DataFrame level would not work as the file wouldn't even be able to be parsed into a DataFrame in the first place, due to the initial rows being not CSV rows. Is it possible to specify to skip specific file lines, instead of the first N rows of a file, perhaps? Though I'm not sure if that might be any simpler than trying to skip the first N rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
