n3world commented on pull request #10202: URL: https://github.com/apache/arrow/pull/10202#issuecomment-839810210
> > In your opinion would it be reasonable to add two enums to the parser options one for handling too few columns and one for too many with the values ERROR, SKIP and FIX > > Well, I'm not sure I understand in which situation skipping would be the right answer. Can you explain a bit more? I can give you my use case. I have users which upload CSV files to be analyzed. Currently if the csv is malformed an error message is immediately returned on the first error. I would like the behavior to be that on the first pass of parsing the csv any bad rows are skipped but tracked so that if no bad rows are found success but otherwise the user can be notified that not all data could be parsed and tell them all the rows which had an issue. The user then can decide on how they want to handle this situation, ie upload a fixed csv, just keep ignoring the lines or reparse the lines by having the parser "fix" them. This behavior is similar to the pandas.read_csv options error_bad_lines and warn_bad_lines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org