n3world commented on a change in pull request #10255:
URL: https://github.com/apache/arrow/pull/10255#discussion_r631076862
##########
File path: cpp/src/arrow/csv/reader_test.cc
##########
@@ -216,5 +216,83 @@ TEST(StreamingReaderTests, NestedParallelism) {
TestNestedParallelism(thread_pool, table_factory);
}
+TEST(ReaderOptionsTests, SkipRowsAfterNames) {
Review comment:
My guess is that the reason it is simple is because of the comment that
it is intended to skip corrupt rows so for that it probably has to be a bit
brute force.
While adding a more csv aware skip does add some more complexity that
parsing knowledge is already contained in the BoundryFinder implementations, so
it already exists. Also, as a user specifying the number of rows to skip I
would expect that csv rows would be skipped and not file lines.
If it sways your opinion any, I did get the implementation working that uses
the BlockReaders, Chunker and BoundryFinders to skip over the lines and the
parser and everything downstream are unaware. Also, it can skip lines beyond a
single block, satisfying ARROW-8527.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]