[GitHub] [arrow] n3world commented on pull request #10202: ARROW-12673: [C++] Add parser handler for incorrect column counts

GitBox Wed, 12 May 2021 07:18:07 -0700


n3world commented on pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#issuecomment-839810210



   > > In your opinion would it be reasonable to add two enums to the parser 
options one for handling too few columns and one for too many with the values 
ERROR, SKIP and FIX
   > 
   > Well, I'm not sure I understand in which situation skipping would be the 
right answer. Can you explain a bit more?
   
   I can give you my use case. I have users which upload CSV files to be 
analyzed. Currently if the csv is malformed an error message is immediately 
returned on the first error. I would like the behavior to be that on the first 
pass of parsing the csv any bad rows are skipped but tracked so that if no bad 
rows are found success but otherwise the user can be notified that not all data 
could be parsed and tell them all the rows which had an issue. The user then 
can decide on how they want to handle this situation, ie upload a fixed csv, 
just keep ignoring the lines or reparse the lines by having the parser "fix" 
them.
   
   This behavior is similar to the pandas.read_csv options error_bad_lines and 
warn_bad_lines


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] n3world commented on pull request #10202: ARROW-12673: [C++] Add parser handler for incorrect column counts

Reply via email to