Ruiqi Dong created CSV-327:
------------------------------

             Summary: CSVParser applies maxRows to record numbers instead of 
rows produced when setRecordNumber(...) is used
                 Key: CSV-327
                 URL: https://issues.apache.org/jira/browse/CSV-327
             Project: Commons CSV
          Issue Type: Bug
            Reporter: Ruiqi Dong


*Summary*
`CSVParser.Builder#setRecordNumber(...)` sets the next record number to assign. 
This is useful when parsing resumes from the middle of a larger source.

When `CSVFormat.Builder#setMaxRows(...)` is also configured, the iterator 
checks `format.useRow(recordNumber + 1)` before parsing the next row. If the 
resumed record number is greater than `maxRows`, the parser returns no rows at 
all, even though it has not produced any row in this parser instance.
 
*Affected code*
File: `src/main/java/org/apache/commons/csv/CSVParser.java`
{code:java}
private CSVRecord getNextRecord() {
    CSVRecord record = null;
    if (format.useRow(recordNumber + 1)) {
        record = Uncheck.get(CSVParser.this::nextRecord);
    }
    return record;
} {code}
File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
{code:java}
boolean useRow(final long rowNum) {
    return !useMaxRows() || rowNum <= getMaxRows();
} {code}
The check uses the assigned record number as if it were a count of rows 
produced by the current parser.
*Reproducer*
Add this test to `src/test/java/org/apache/commons/csv/CSVParserTest.java`
{code:java}
@Test
void testGetRecordsMaxRowsWithRecordNumberOffset() throws IOException {
    try (CSVParser parser = CSVParser.builder()
            .setReader(new StringReader("a,b\nc,d\n"))
            .setFormat(CSVFormat.DEFAULT.builder().setMaxRows(1).get())
            .setRecordNumber(2)
            .get()) {
        final List<CSVRecord> records = parser.getRecords();
        assertEquals(1, records.size());
        assertEquals(2, records.get(0).getRecordNumber());
        assertValuesEquals(new String[] { "a", "b" }, records.get(0));
    }
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.csv.CSVParserTest#testGetRecordsMaxRowsWithRecordNumberOffset
 test {code}
Observed behavior:
The test fails
{code:java}
expected: <1> but was: <0> {code}
*Expected behavior:*
`maxRows` should limit how many rows this parser/iterator produces. It should 
not reject the first row solely because the caller configured the next assigned 
record number to start at `2`.
 
`setRecordNumber(...)` is record-number metadata. `setMaxRows(...)` is a 
row-production limit. Coupling these two values breaks resumed parsing: a 
caller cannot both preserve original record numbers and limit the number of 
rows returned.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to