rkhachatryan commented on a change in pull request #13648:
URL: https://github.com/apache/flink/pull/13648#discussion_r507745730



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartitionType.java
##########
@@ -71,7 +71,17 @@
         * <p>For batch jobs, it will be best to keep this unlimited ({@link 
#PIPELINED}) since there are
         * no checkpoint barriers.
         */
-       PIPELINED_BOUNDED(true, true, true, false);
+       PIPELINED_BOUNDED(true, true, true, false),
+
+       /**
+        * Pipelined partitions with a bounded (local) buffer pool to support 
downstream task to
+        * continue consuming data after reconnection in Approximate 
Local-Recovery.
+        *
+        * <p>Pipelined results can be consumed only once by a single consumer 
at one time.
+        * {@link #PIPELINED_APPROXIMATE} is different from {@link 
#PIPELINED_BOUNDED} in that
+        * {@link #PIPELINED_APPROXIMATE} is not decomposed automatically after 
consumption.
+        */
+       PIPELINED_APPROXIMATE(true, true, true, true);

Review comment:
       Can you please explain why this partition type is bounded?

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",

Review comment:
       I think this message should mention that a new view is being created.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",
+                                       parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+                               releaseView();
+                       }
+
+                       LOG.debug("{}: Creating read view for subpartition {} 
of partition {}.",
+                               parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+
+                       readView = new 
PipelinedApproximateSubpartitionView(this, availabilityListener);
+               }
+
+               return readView;
+       }
+
+       @Override
+       Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+               if (isPartialBuffer) {
+                       isPartialBuffer = !buffer.cleanupPartialRecord();
+               }
+
+               return buffer.build();
+       }
+
+       void releaseView() {
+               LOG.info("Releasing view of subpartition {} of {}.", 
getSubPartitionIndex(), parent.getPartitionId());
+               readView = null;

Review comment:
       The writes in this method should be done under a lock, right?
   But I'm not sure that all execution paths do acquire this lock.
   Should we add `synchronized (buffers)` or `checkState(Thread.holdsLock)`?
   

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",
+                                       parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+                               releaseView();
+                       }
+
+                       LOG.debug("{}: Creating read view for subpartition {} 
of partition {}.",
+                               parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+
+                       readView = new 
PipelinedApproximateSubpartitionView(this, availabilityListener);
+               }
+
+               return readView;
+       }
+
+       @Override
+       Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+               if (isPartialBuffer) {
+                       isPartialBuffer = !buffer.cleanupPartialRecord();
+               }
+
+               return buffer.build();
+       }
+
+       void releaseView() {
+               LOG.info("Releasing view of subpartition {} of {}.", 
getSubPartitionIndex(), parent.getPartitionId());
+               readView = null;
+               isPartialBuffer = true;
+               isBlockedByCheckpoint = false;
+               sequenceNumber = 0;
+       }
+
+       @Override
+       public String toString() {

Review comment:
       I couldn't find any differences from `super.toString` other than class 
name.
   Can we just replace in super `"PipelinedSubpartition` with 
`getSiimpleClassName` instead of overriding?
   
   ditto: view

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",
+                                       parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+                               releaseView();
+                       }
+
+                       LOG.debug("{}: Creating read view for subpartition {} 
of partition {}.",
+                               parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+
+                       readView = new 
PipelinedApproximateSubpartitionView(this, availabilityListener);
+               }
+
+               return readView;
+       }
+
+       @Override
+       Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+               if (isPartialBuffer) {
+                       isPartialBuffer = !buffer.cleanupPartialRecord();
+               }
+
+               return buffer.build();

Review comment:
       nit: `super.buildSliceBuffer` ?

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartitionView.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * View over a pipelined in-memory only subpartition allowing reconnecting.
+ */
+public class PipelinedApproximateSubpartitionView extends 
PipelinedSubpartitionView {
+
+       PipelinedApproximateSubpartitionView(PipelinedApproximateSubpartition 
parent, BufferAvailabilityListener listener) {
+               super(parent, listener);
+       }
+
+       @Override
+       public void releaseAllResources() {

Review comment:
       I think this method is called not only upon downstream RPC, but also on 
task shutdown and other cases.
   If so, completely skipping of `super.releaseAllResources` can lead to 
resource leaks in those cases.
   WDYT?

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedSubpartition.java
##########
@@ -259,9 +260,10 @@ BufferAndBacklog pollBuffer() {
                        }
 
                        while (!buffers.isEmpty()) {
-                               BufferConsumer bufferConsumer = 
buffers.peek().getBufferConsumer();
+                               BufferConsumerWithPartialRecordLength 
bufferConsumerWithPartialRecordLength = buffers.peek();
+                               BufferConsumer bufferConsumer = 
requireNonNull(bufferConsumerWithPartialRecordLength).getBufferConsumer();

Review comment:
       I think there is no point in adding explicit `requireNonNull` just 
before dereferencing it.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",
+                                       parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+                               releaseView();
+                       }
+
+                       LOG.debug("{}: Creating read view for subpartition {} 
of partition {}.",
+                               parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+
+                       readView = new 
PipelinedApproximateSubpartitionView(this, availabilityListener);
+               }
+
+               return readView;
+       }
+
+       @Override
+       Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+               if (isPartialBuffer) {
+                       isPartialBuffer = !buffer.cleanupPartialRecord();
+               }
+
+               return buffer.build();
+       }
+
+       void releaseView() {
+               LOG.info("Releasing view of subpartition {} of {}.", 
getSubPartitionIndex(), parent.getPartitionId());
+               readView = null;

Review comment:
       I'm concerned about a potential race condition here (even with 
`synchronized` added).
   
   Consider a case:
   Thread1: call `subpartition.createReadView()` - create `view1`
   Thread2: obtain a reference to `view1`
   Thread1: call `subpartition.createReadView()` - create `view2`
   Thread2: call `view1.releaseAllResources` <-- nulls out 
subpartition.readView; `view2` is now corrupt?
   
   WDYT?

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import 
org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after 
failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+       private boolean isPartialBuffer = false;
+
+       PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+               super(index, parent);
+       }
+
+       @Override
+       public PipelinedSubpartitionView 
createReadView(BufferAvailabilityListener availabilityListener) {
+               synchronized (buffers) {
+                       checkState(!isReleased);
+
+                       // if the view is not released yet
+                       if (readView != null) {
+                               LOG.info("{} ReadView for Subpartition {} of {} 
has not been released!",
+                                       parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+                               releaseView();
+                       }
+
+                       LOG.debug("{}: Creating read view for subpartition {} 
of partition {}.",
+                               parent.getOwningTaskName(), 
getSubPartitionIndex(), parent.getPartitionId());
+
+                       readView = new 
PipelinedApproximateSubpartitionView(this, availabilityListener);
+               }
+
+               return readView;
+       }
+
+       @Override
+       Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+               if (isPartialBuffer) {
+                       isPartialBuffer = !buffer.cleanupPartialRecord();
+               }
+
+               return buffer.build();
+       }
+
+       void releaseView() {
+               LOG.info("Releasing view of subpartition {} of {}.", 
getSubPartitionIndex(), parent.getPartitionId());
+               readView = null;
+               isPartialBuffer = true;

Review comment:
       The name `isPartialBuffer` is a bit misleading to me because it implies 
that partial buffer was emitted.
   But in fact, this field reflects that the view was released.
   How about `isPartialBufferCleanupRequired`?
   

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartitionFactory.java
##########
@@ -130,8 +132,15 @@ public ResultPartition create(
                                bufferCompressor,
                                bufferPoolFactory);
 
+                       BiFunction<Integer, PipelinedResultPartition, 
PipelinedSubpartition> factory;
+                       if (type == ResultPartitionType.PIPELINED_APPROXIMATE) {
+                               factory = PipelinedApproximateSubpartition::new;
+                       } else {
+                               factory = PipelinedSubpartition::new;
+                       }
+

Review comment:
       nit: I'd prefer this simple ternary if in a loop:
   ```
   for (int i = 0; i < subpartitions.length; i++) {
       subpartitions[i] = type == ResultPartitionType.PIPELINED_APPROXIMATE ?
           new PipelinedApproximateSubpartition(i, pipelinedPartition) :
           new PipelinedSubpartition(i, pipelinedPartition);
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to