[
https://issues.apache.org/jira/browse/CSV-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087637#comment-18087637
]
Ruiqi Dong commented on CSV-327:
--------------------------------
PR was submitted
> CSVParser applies maxRows to record numbers instead of rows produced when
> setRecordNumber(...) is used
> ------------------------------------------------------------------------------------------------------
>
> Key: CSV-327
> URL: https://issues.apache.org/jira/browse/CSV-327
> Project: Commons CSV
> Issue Type: Bug
> Reporter: Ruiqi Dong
> Priority: Minor
>
> *Summary*
> `CSVParser.Builder#setRecordNumber(...)` sets the next record number to
> assign. This is useful when parsing resumes from the middle of a larger
> source.
> When `CSVFormat.Builder#setMaxRows(...)` is also configured, the iterator
> checks `format.useRow(recordNumber + 1)` before parsing the next row. If the
> resumed record number is greater than `maxRows`, the parser returns no rows
> at all, even though it has not produced any row in this parser instance.
>
> *Affected code*
> File: `src/main/java/org/apache/commons/csv/CSVParser.java`
> {code:java}
> private CSVRecord getNextRecord() {
> CSVRecord record = null;
> if (format.useRow(recordNumber + 1)) {
> record = Uncheck.get(CSVParser.this::nextRecord);
> }
> return record;
> } {code}
> File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
> {code:java}
> boolean useRow(final long rowNum) {
> return !useMaxRows() || rowNum <= getMaxRows();
> } {code}
> The check uses the assigned record number as if it were a count of rows
> produced by the current parser.
> *Reproducer*
> Add this test to `src/test/java/org/apache/commons/csv/CSVParserTest.java`
> {code:java}
> @Test
> void testGetRecordsMaxRowsWithRecordNumberOffset() throws IOException {
> try (CSVParser parser = CSVParser.builder()
> .setReader(new StringReader("a,b\nc,d\n"))
> .setFormat(CSVFormat.DEFAULT.builder().setMaxRows(1).get())
> .setRecordNumber(2)
> .get()) {
> final List<CSVRecord> records = parser.getRecords();
> assertEquals(1, records.size());
> assertEquals(2, records.get(0).getRecordNumber());
> assertValuesEquals(new String[] { "a", "b" }, records.get(0));
> }
> } {code}
> Run:
> {code:java}
> mvn -q
> -Dtest=org.apache.commons.csv.CSVParserTest#testGetRecordsMaxRowsWithRecordNumberOffset
> test {code}
> Observed behavior:
> The test fails
> {code:java}
> expected: <1> but was: <0> {code}
> *Expected behavior:*
> `maxRows` should limit how many rows this parser/iterator produces. It should
> not reject the first row solely because the caller configured the next
> assigned record number to start at `2`.
>
> `setRecordNumber(...)` is record-number metadata. `setMaxRows(...)` is a
> row-production limit. Coupling these two values breaks resumed parsing: a
> caller cannot both preserve original record numbers and limit the number of
> rows returned.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)