bbannier commented on issue #8824: URL: https://github.com/apache/datafusion/issues/8824#issuecomment-2078835858
I came across this issue since I was looking for a way to deal with CSV files containing comments (in my case: lines starting with `# `). Please let me know if I should open a new issue for that. I was originally looking for a way to replace the reader or hook after the builtin one, but from reading above comments this probably does not fit `datafusion`'s input handling. _Skip first n rows_ wouldn't help in my case, but if there was a way to hook into the CSV reader by providing a line-based filter function it might (naively: a `CsvOptions` field like `filter: FnOnce(&str) -> bool`). Maybe something like this could even be generalized to a line-based transformer function a la `filter_map` (probably would need to return some `Option<Cow>` to not penalize use cases which only filter, but do not transform). If OP's lines to skip can be clearly identified this might be able to address their use case as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
