n3world commented on a change in pull request #10255:
URL: https://github.com/apache/arrow/pull/10255#discussion_r630155517
##########
File path: cpp/src/arrow/csv/reader_test.cc
##########
@@ -216,5 +216,83 @@ TEST(StreamingReaderTests, NestedParallelism) {
TestNestedParallelism(thread_pool, table_factory);
}
+TEST(ReaderOptionsTests, SkipRowsAfterNames) {
Review comment:
Actually after looking at it a bit more it doesn't have to be moved out
of the reader but I don't think it can use SkipRows. SkipRows is very simple in
its implementation it doesn't actually skip rows but lines in the file. I make
the distinction here because a row can contain values with new lines. If any of
the rows contain a quoted or escaped new line skip rows will consider that two
lines and not one.
I was thinking it might be better to add add a FirstN method to Chunker to
be able to get the Nth occurrence of the line endings. I was thinking this
could be integrated into the BlockReader implementations to be able to skip
over rows even beyond the first block. This could also solve ARROW-8527.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]