[
https://issues.apache.org/jira/browse/FLINK-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruidong Li updated FLINK-12405:
-------------------------------
Description:
The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to
{{BLOCKING}} except it might be consumed for several times and will be released
after TM shutdown or {{ResultPartition}} removal request.
This is the basis for Interactive Programming.
Here is the brief changes:
* Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}}
* Introduce {{BlockingShuffleOutputFormat}} which contains a user specified
{{IntermediateDataSetID}}(passed from TableAPI in later PR)
* when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with
{{BlockingShuffleOutputFormat}}, it creates a {{IntermediateDataSet}} with this
id, then add it to its predecessor, the {{OutputFormatVertex}} for this
{{GenericDataSinkBase}} will be excluded in {{JobGraph}}
* So the JobGraph may contains some JobVertex which has more
{{IntermediateDataSet}} than its downstream consumers.
Here are some design notes:
* Why modify {{DataSet}} and {{JobGraphGenerator}}
Since Blink Planner is not ready yet, and Batch Table is running on Flink
Planner(based on DataSet).
There will be another implementation once Blink Planner is ready.
* Why use a special {{OutputFormat}} as placeholder
We could add a {{cache()}} method for DataSet, but we do not want to change
DataSet API any more. so a special {{OutputFormat}} as placeholder seems
reasonable.
was:
The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to
{{BLOCKING}} except it might be consumed for several times and will be released
after TM shutdown or {{ResultPartition}} removal request.
This is the basis for Interactive Programming.
Here is the brief changes:
* Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}}
* Introduce {{BlockingShuffleOutputFormat}} and {{BlockingShuffleDataSink}}
which are used to generate {{GenericDataSinkBase}} with user specified
{{IntermediateDataSetID}} (passed from TableAPI in later PR)
* when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with
{{IntermediateDataSetID}}, it creates a {{IntermediateDataSet}} with this id,
then {{OutputFormatVertex}} for this {{GenericDataSinkBase}} will be excluded
in {{JobGraph}}
* So the JobGraph may contains some JobVertex which has more
{{IntermediateDataSet}} than its downstream consumers.
Here are some design notes:
* Why modify {{DataSet}} and {{JobGraphGenerator}}
Since Blink Planner is not ready yet, and Batch Table is running on Flink
Planner(based on DataSet).
There will be another implementation once Blink Planner is ready.
* Why use a special {{OutputFormat}} as placeholder
We could add a {{cache()}} method for DataSet, but we do not want to change
DataSet API any more. so a special {{OutputFormat}} as placeholder seems
reasonable.
> Introduce BLOCKING_PERSISTENT result partition type
> ---------------------------------------------------
>
> Key: FLINK-12405
> URL: https://issues.apache.org/jira/browse/FLINK-12405
> Project: Flink
> Issue Type: Sub-task
> Components: API / DataSet
> Reporter: Ruidong Li
> Assignee: Ruidong Li
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The new {{ResultPartitionType}} : {{BLOCKING_PERSISTENT}} is similar to
> {{BLOCKING}} except it might be consumed for several times and will be
> released after TM shutdown or {{ResultPartition}} removal request.
> This is the basis for Interactive Programming.
> Here is the brief changes:
> * Introduce {{ResultPartitionType}} called {{BLOCKING_PERSISTENT}}
> * Introduce {{BlockingShuffleOutputFormat}} which contains a user specified
> {{IntermediateDataSetID}}(passed from TableAPI in later PR)
> * when {{JobGraphGenerator}} sees a {{GenericDataSinkBase}} with
> {{BlockingShuffleOutputFormat}}, it creates a {{IntermediateDataSet}} with
> this id, then add it to its predecessor, the {{OutputFormatVertex}} for this
> {{GenericDataSinkBase}} will be excluded in {{JobGraph}}
> * So the JobGraph may contains some JobVertex which has more
> {{IntermediateDataSet}} than its downstream consumers.
> Here are some design notes:
> * Why modify {{DataSet}} and {{JobGraphGenerator}}
> Since Blink Planner is not ready yet, and Batch Table is running on Flink
> Planner(based on DataSet).
> There will be another implementation once Blink Planner is ready.
> * Why use a special {{OutputFormat}} as placeholder
> We could add a {{cache()}} method for DataSet, but we do not want to change
> DataSet API any more. so a special {{OutputFormat}} as placeholder seems
> reasonable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)