zhuzhurk commented on a change in pull request #11647: [FLINK-16960][runtime]
Add PipelinedRegion interface
URL: https://github.com/apache/flink/pull/11647#discussion_r405984395
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/topology/Topology.java
##########
@@ -39,4 +39,26 @@
* @return whether the topology contains co-location constraints
*/
boolean containsCoLocationConstraints();
+
+ /**
+ * Returns all pipelined regions in this topology.
+ *
+ * @return Iterable over pipelined regions in this topology
+ */
+ default Iterable<PipelinedRegion<VID, RID, V, R>>
getAllPipelinedRegions() {
Review comment:
The POC should work. And I think it's fine to maintain 2 translation
algorithms for logical and execution graphs.
The main concern is the time to translate the graph and the GC caused by it,
especially for large scale jobs.
- For jobs with hundreds of millions of edges (e.g. a 10000x10000 map
reduce), it will take tens of seconds to build the ExecutionGraph. And it might
take another tens of seconds to create a translated graph.
- Besides that, hundreds of millions of temporary edge instances must be
created and used at the same time in the translated graph. This may result in
more JM memory requirement otherwise OOM might happen. And it may also cause GC
issues since they are not needed anymore after the regions are built.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services