bbannier commented on issue #8824:
URL: https://github.com/apache/datafusion/issues/8824#issuecomment-2078835858

   I came across this issue since I was looking for a way to deal with CSV 
files containing comments (in my case: lines starting with `# `). Please let me 
know if I should open a new issue for that.
   
   I was originally looking for a way to replace the reader or hook after the 
builtin one, but from reading above comments this probably does not fit 
`datafusion`'s input handling.
   
   _Skip first n rows_ wouldn't help in my case, but if there was a way to hook 
into the CSV reader by providing a line-based filter function it might 
(naively: a `CsvOptions` field like `filter: FnOnce(&str) -> bool`). Maybe 
something like this could even be generalized to a line-based transformer 
function a la `filter_map` (probably would need to return some `Option<Cow>` to 
not penalize use cases which only filter, but do not transform). If OP's lines 
to skip can be clearly identified this might be able to address their use case 
as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to