Matt We're probably thinking of similar concepts but wanted to be sure - my thought here is the RecordReader interface keeps track of how many records have been read. If a call to next reader bails for any given impl then one could reliably check how many records have been read and thus know the index of the failed one. I've not looked closely at the API so there is probably a cleaner model. But this feels like a RecordReader level capability rather than specific to any particular implementation such as CSVRecordReader.
Thanks On Thu, Jan 19, 2023 at 1:59 PM Matt Burgess <mattyb...@apache.org> wrote: > I was thinking the same, CSVRecordReader could keep track of the > number of records read and if an exception is thrown during iteration > over reading the records, we can output the number of records read > successfully. > > On Thu, Jan 19, 2023 at 3:47 PM Joe Witt <joe.w...@gmail.com> wrote: > > > > Dan, > > > > Seems like our record reader mechanism should offer the concept of > tracking > > which record it is on such that this could be logged. It looks from a > > quick check like we track record count on writing so something similar on > > the interface of the reader could be quite helpful. > > > > Perhaps best to file a JIRA. Someone else might have a better idea of > what > > you can do now. > > > > Thanks > > > > On Thu, Jan 19, 2023 at 1:39 PM Dan S <dsti...@gmail.com> wrote: > > > > > For both QueryRecord and ValidateRecord when I use a CSVReader on a > file > > > which has different delimiters than the rest of the file, the error > message > > > logged does not include the line number where the parsing failed. When > > > looking at the code, I did not see any hooks for getting that > information. > > > Is there a way to get the line number so it would be easy to identify > > > which lines would need to be fixed? > > > >