[GitHub] [arrow] KarateSnowMachine opened a new pull request, #36191: PARQUET-36189:[C++]: Parquet StreamReader::SkipRows() skips to incorrect place in mult-row-group files

via GitHub Tue, 20 Jun 2023 14:05:37 -0700


KarateSnowMachine opened a new pull request, #36191:
URL: https://github.com/apache/arrow/pull/36191


   ### Rationale for this change
   
   The behavior of Parquet `StreamReader::SkipRows()` is wrong due to an error 
in calculating the row offset from the current row group. 
   
   ### What changes are included in this PR?
   
   A unit test case demonstrating the failure and a trivial fix. 
   
   ### Are these changes tested?
   
   Yes 
   
   ### Are there any user-facing changes?
   
   No
   
   
   I am not sure if this bug is critical given how long it has existed in the 
code and no one has seemed to notice. There are two manifestations of this bug 
that might give the user the wrong impression about what is in their data: 
   
   * sometimes a negative return value is returned, which is unexpected given 
the nature of the API so the user should know something is up (this is how I 
discovered the bug)
   * the SkipRows() call leads to an early setting of the `eof` flag 
prematurely, which might give the user to think there is less data in the file 
than there is. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] KarateSnowMachine opened a new pull request, #36191: PARQUET-36189:[C++]: Parquet StreamReader::SkipRows() skips to incorrect place in mult-row-group files

Reply via email to