zhijiangW commented on a change in pull request #11507: [FLINK-16587] Add basic CheckpointBarrierHandler for unaligned checkpoint URL: https://github.com/apache/flink/pull/11507#discussion_r403725829
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/SpillingAdaptiveSpanningRecordDeserializer.java ########## @@ -557,6 +579,28 @@ private void addNextChunkFromMemorySegment(MemorySegment segment, int offset, in } } + @Nullable + MemorySegment copyToTargetSegment() { + // for the case of only partial length, no data + final int position = lengthBuffer.position(); + if (position > 0) { + MemorySegment segment = MemorySegmentFactory.allocateUnpooledSegment(lengthBuffer.remaining()); + segment.put(0, lengthBuffer, lengthBuffer.remaining()); + lengthBuffer.position(position); + return segment; + } + + // for the case of full length, partial data in buffer + if (recordLength != -1) { + // In the PoC we skip the case of large record which size exceeds THRESHOLD_FOR_SPILLING, Review comment: I am not thinking through the solution yet. In my previous PoC, I also found it is difficult to handle since it is also not easy to read the partial data from spilled files ATM, so I left it as TODO to not pay much efforts. I think we might not support this large record case in MVP, but we should not mute it. Otherwise once we encounter this case in production, the completed checkpoint can not be restored correctly because of missing partial records. Maybe we can discard the unaligned checkpoint for large record case or throw exceptions to warn users for tuning the proper checkpoint setting? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services