sumihehe commented on issue #2346:
URL: https://github.com/apache/hudi/issues/2346#issuecomment-758591316


   > @sumihehe Did you get a chance to look at above? It'll be helpful if you 
can provide more information.
   
   - Around 1w rows when the case happened
   - Around 60 deltacommits and 0 commit when the case happened
   - it returns correctly on the *mor_ro table
   - It can not be reproduced when i made more deltacommit/commits on the table 
later. So it returns correctly now with  HoodieParquetRealtimeInputFormat or 
   
   It think It will occur on condition that:
   1.   in a rt table
   2.  the hive query has predicate push down
   3.  there are no less  than 3 splits (thus no less than 3 recordReaders in 
HoodieCombineRealtimeRecordReader), and the records satisfy the predicate are 
in the split which is in a relatively back position of the List
   4.  2 recordReaders  in succession with _this.currentRecordReader.next(key, 
value)_ returns false, as  the predicate push down has filtered the baseFile.
   
   In step 4, it leads to _HoodieCombineRealtimeRecordReader::next(NullWritable 
key, ArrayWritable value)_ return false and the reader will stop read next. So, 
records which satisfy the predicate are in the remanined recordReaders but can 
not be read.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to