[GitHub] [flink] reswqa opened a new pull request, #21415: [FLINK-30189] HsSubpartitionFileReader may load data that has been consumed from memory

GitBox Mon, 28 Nov 2022 22:45:10 -0800


reswqa opened a new pull request, #21415:
URL: https://github.com/apache/flink/pull/21415


   ## What is the purpose of the change
   
   *In order to solve the problem that data cannot be read from the disk 
correctly after failover, we changed the calculation logical of the buffer's 
readable state in 
[FLINK-29238](https://issues.apache.org/jira/browse/FLINK-29238).  Buffers that 
are greater than consumingOffset and have been released can be pre-load from 
file. However, the update of consumingOffset is asynchronous, If it lags behind 
the actual consumption progress, the buffer will have a chance to be load from 
the disk again. 
   
   IMO, we can record the consumed status of buffer by each consumer in the 
InternalRegion. Only the buffers that have not been consumed and have been 
released will be considered as readable. In the case of failover, a new 
consumerId will be generated, so all buffers will be considered as unconsumed 
and can be correctly read from the disk too.*
   
   
   ## Brief change log
   
     - *`HsFileDataIndex` maintains the buffer's consumed state of each 
consumer.*
     - *`ReadableRegion` is no longer decided by consuming offset.*
   
   ## Verifying this change
   
   This change is already covered by existing tests
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] reswqa opened a new pull request, #21415: [FLINK-30189] HsSubpartitionFileReader may load data that has been consumed from memory

Reply via email to