StephanEwen commented on a change in pull request #13447:
URL: https://github.com/apache/flink/pull/13447#discussion_r492720319
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java
##########
@@ -109,89 +94,58 @@
}
}
- protected void emit(T record, int targetChannel) throws IOException,
InterruptedException {
+ protected void emit(T record, int targetSubpartition) throws
IOException {
checkErroneous();
- serializer.serializeRecord(record);
-
- // Make sure we don't hold onto the large intermediate
serialization buffer for too long
- copyFromSerializerToTargetChannel(targetChannel);
- }
-
- /**
- * @param targetChannel
- * @return <tt>true</tt> if the intermediate serialization buffer
should be pruned
- */
- protected boolean copyFromSerializerToTargetChannel(int targetChannel)
throws IOException, InterruptedException {
- // We should reset the initial position of the intermediate
serialization buffer before
- // copying, so the serialization results can be copied to
multiple target buffers.
- serializer.reset();
-
- boolean pruneTriggered = false;
- BufferBuilder bufferBuilder = getBufferBuilder(targetChannel);
- SerializationResult result =
serializer.copyToBufferBuilder(bufferBuilder);
- while (result.isFullBuffer()) {
- finishBufferBuilder(bufferBuilder);
-
- // If this was a full record, we are done. Not breaking
out of the loop at this point
- // will lead to another buffer request before breaking
out (that would not be a
- // problem per se, but it can lead to stalls in the
pipeline).
- if (result.isFullRecord()) {
- pruneTriggered = true;
- emptyCurrentBufferBuilder(targetChannel);
- break;
- }
-
- bufferBuilder = requestNewBufferBuilder(targetChannel);
- result = serializer.copyToBufferBuilder(bufferBuilder);
- }
- checkState(!serializer.hasSerializedData(), "All data should be
written at once");
+ targetPartition.emitRecord(serializeRecord(serializer, record),
targetSubpartition);
if (flushAlways) {
- flushTargetPartition(targetChannel);
+ targetPartition.flush(targetSubpartition);
}
- return pruneTriggered;
}
public void broadcastEvent(AbstractEvent event) throws IOException {
broadcastEvent(event, false);
}
public void broadcastEvent(AbstractEvent event, boolean
isPriorityEvent) throws IOException {
- try (BufferConsumer eventBufferConsumer =
EventSerializer.toBufferConsumer(event)) {
- for (int targetChannel = 0; targetChannel <
numberOfChannels; targetChannel++) {
- tryFinishCurrentBufferBuilder(targetChannel);
-
- // Retain the buffer so that it can be recycled
by each channel of targetPartition
-
targetPartition.addBufferConsumer(eventBufferConsumer.copy(), targetChannel,
isPriorityEvent);
- }
+ targetPartition.broadcastEvent(event, isPriorityEvent);
- if (flushAlways) {
- flushAll();
- }
+ if (flushAlways) {
+ flushAll();
}
}
- public void flushAll() {
- targetPartition.flushAll();
+ @VisibleForTesting
+ public static ByteBuffer serializeRecord(
Review comment:
It would be really great if this method were not public. Ideally we can
remove this completely, because all tests that use this bypass some crucial
logic of this class and may result in meaningless tests.
This method is used in three places:
- The occurrence in `SingleInputGateTest` can be replaced with emitting a
record.
- The occurrence in `TestPartitionProducer` could be removed by adjusting
`TestProducerSource` to produce `ByteBuffer` instead of `BufferConsumer`, which
looks like a nice change that might even simplify things.
- If the change for `PartitionTestUtils` could in theory be kept, and the
visibility of the method be reduced to package-private.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/BoundedBlockingResultPartition.java
##########
@@ -63,6 +63,22 @@ public BoundedBlockingResultPartition(
bufferPoolFactory);
}
+ @Override
+ public void flush(int targetSubpartition) {
+ finishBroadcastBufferBuilder();
Review comment:
Just to double check: We do not want this to be the default behavior in
`BufferWritingResultPartition`, because this would finish the partial buffers
for streaming/pipelined cases as well, which we don't want.
I think this logic may be confusing for future developers. What we could do
is the following:
- `BufferWritingResultPartition` leaves the `void flush(int)` and
`flushAll()` methods abstract.
- Instead it offers `protected void flushSubpartition(int partition,
boolean finishProducers)` and `protected void flushAllSubpartitions(boolean
finishProducers)`. That makes it clear that there is a producer that may or may
not be finished, so the caller has to be aware of this behavior.
- The `BoundedBlockingResultPartition` then implements `flushAll() {
flushAllSubpartitions(true); }` and the `PipelinedResultPartition` implements
`flushAll() { flushAllSubpartitions(false); }`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]