[ 
https://issues.apache.org/jira/browse/CSV-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086344#comment-18086344
 ] 

Gary D. Gregory commented on CSV-327:
-------------------------------------

Hello [~rickydong]

Thank you for your report.
Feel free to provide a PR on GitHub.

 

> CSVParser applies maxRows to record numbers instead of rows produced when 
> setRecordNumber(...) is used
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CSV-327
>                 URL: https://issues.apache.org/jira/browse/CSV-327
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Ruiqi Dong
>            Priority: Minor
>
> *Summary*
> `CSVParser.Builder#setRecordNumber(...)` sets the next record number to 
> assign. This is useful when parsing resumes from the middle of a larger 
> source.
> When `CSVFormat.Builder#setMaxRows(...)` is also configured, the iterator 
> checks `format.useRow(recordNumber + 1)` before parsing the next row. If the 
> resumed record number is greater than `maxRows`, the parser returns no rows 
> at all, even though it has not produced any row in this parser instance.
>  
> *Affected code*
> File: `src/main/java/org/apache/commons/csv/CSVParser.java`
> {code:java}
> private CSVRecord getNextRecord() {
>     CSVRecord record = null;
>     if (format.useRow(recordNumber + 1)) {
>         record = Uncheck.get(CSVParser.this::nextRecord);
>     }
>     return record;
> } {code}
> File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
> {code:java}
> boolean useRow(final long rowNum) {
>     return !useMaxRows() || rowNum <= getMaxRows();
> } {code}
> The check uses the assigned record number as if it were a count of rows 
> produced by the current parser.
> *Reproducer*
> Add this test to `src/test/java/org/apache/commons/csv/CSVParserTest.java`
> {code:java}
> @Test
> void testGetRecordsMaxRowsWithRecordNumberOffset() throws IOException {
>     try (CSVParser parser = CSVParser.builder()
>             .setReader(new StringReader("a,b\nc,d\n"))
>             .setFormat(CSVFormat.DEFAULT.builder().setMaxRows(1).get())
>             .setRecordNumber(2)
>             .get()) {
>         final List<CSVRecord> records = parser.getRecords();
>         assertEquals(1, records.size());
>         assertEquals(2, records.get(0).getRecordNumber());
>         assertValuesEquals(new String[] { "a", "b" }, records.get(0));
>     }
> } {code}
> Run:
> {code:java}
> mvn -q 
> -Dtest=org.apache.commons.csv.CSVParserTest#testGetRecordsMaxRowsWithRecordNumberOffset
>  test {code}
> Observed behavior:
> The test fails
> {code:java}
> expected: <1> but was: <0> {code}
> *Expected behavior:*
> `maxRows` should limit how many rows this parser/iterator produces. It should 
> not reject the first row solely because the caller configured the next 
> assigned record number to start at `2`.
>  
> `setRecordNumber(...)` is record-number metadata. `setMaxRows(...)` is a 
> row-production limit. Coupling these two values breaks resumed parsing: a 
> caller cannot both preserve original record numbers and limit the number of 
> rows returned.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to