n3world commented on pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#issuecomment-836840781


   > > At a minimum I would like to be able to skip the rows and collect 
information about the skipped rows so it could be presented to a user saying 
when and where malformed rows were found
   > 
   > I agree that being able to point the row number where an error occurred is 
useful, but we shouldn't need a callback for that.
   For that alone no. But when you start to think about the combinations of 
ways these rows could be handled it starts to get very complex for both short 
rows and long rows you could either error, skip or fix and if you don't error 
do you need to report that row or is it silent. To describe that combination of 
possible handlers you would need 5 options for both short and long rows and 
then you would need to express any combination of those 5 options. The 
distinction between the silent skip and report skip is because currently the 
best way to report a row is by including the entire text of the row and if 
there are a good number of rows that need to be reported that could result in 
noticeable overhead if the caller just wants the handling to be done silently. 
Because of this I was thinking it would be easier to expose a callback with 
some pre defined simple implementations. That way more complex options could be 
implemented by the user.
   
   If we wanted to not have the callback and support that matrix of options the 
best way might be two enums one for short rows and one for long rows and then a 
mechanism to track rows which are to be reported. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to