StephanEwen commented on pull request #13724: URL: https://github.com/apache/flink/pull/13724#issuecomment-722365033
This looks pretty good to me, we could merge it like it is. One idea for an improvement would be to not use the "skipRecordsCount" at all here. I fear this can lead to surprised with ORC due to pushed down predicates. If during after update, the predicate would be more selective, then ORC itself would filter more rows and we would skip too many later. What we could do is the following: The `VectorizedRowBatch` has the `int[] selected` array, which has the positions of the rows. We could also pass that array to the `ColumnarRowIterator`, instead of the `startingOffset`. When returning the next record, it would set that position to the result, rather than incrementing the skipCount. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
