[jira] [Updated] (FLINK-972) Run Flink on Tez

Henry Saputra (JIRA) Tue, 24 Jun 2014 10:53:34 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Henry Saputra updated FLINK-972:
--------------------------------

    Description: 
The current status is:

  - A prototype that explores how Tez/Flink classes can interoperate was 
created by Filip Haase and is at 
https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto2

  - There is a version that runs "WordCount" in Tez, using the Flink input 
formats, output formats, and UDFs.


Next steps towards generic support for Flink programs are:

1) Integrate the Flink Memory Manager with Tez. This means actually defining 
how much memory of each container Flink may allocate for its internal 
algorithms. In Flink's core, we allow users to set the amount of memory, or 
define it relative to the heap size (with 0.7*heap_size) being used if nothing 
else is specified.

2) Create a version of the Flink task context (PactTaskContext) for Tez: This 
would allow to run the Flink runtime operators on a Tez processor.

3) Integrate Flink "ship strategies" (partitioning, replication, 
redistribution, etc) with the way Tez parameterizes connections.

4) Integrate the Flink Sorting/Caching with Tez. This should be simple, if the 
memory manager is there, these classes should work out of the box.

5) Create a component that creates the Tez DAG from a flink "OptimizedPlan". We 
currently have a component that creates a "Job Graph" (Flink's DAG) from an 
OptimizedPlan, it is the last step of the "pre-flight phase" before the job is 
given to the master to be scheduled. We need an equivalent component to create 
a Tez DAG.

6) Create a distribution that uses Tez as distributed runtime. Create a 
"client" that creates a Tez AM on Yarn and submits the DAG there. Adopt the 
bash scripts to pick up the Tez and Yarn parameters and set up the client 
correctly.


  was:
The current status is:

  - A prototype that explores how Tez/Flink classes can interoperate was 
created by Filip Haase and is at 
https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto2

  - There is a version that runs "WordCount" in Tez, using the Flink input 
formats, output formats, and UDFs.


Next steps towards generic support for Flink programs are:

1) Integrate the Flink Memory Manager with Tez. This means actually defining 
how much memory of each container Flink may allocate for its internal 
algorithms. In Flink's core, we allow users to set the amount of memory, or 
define it relative to the heap size (with 0.7*heap_size) being used if nothing 
else is specified.

2) Create a version of the Flink task context (PactTaskContext) for Tez: This 
would allow to run the Flink runtime operators on a Tez processor.

3) Integrate Flink "ship strategies" (partitioning, replication, 
redistribution, etc) with the way Tez parameterizes connections.

4) Integrate the Flink Sorting/Caching with Tez. This should be simple, if the 
memory manager is there, these classes should work out of the box.

5) Create a component that creates the Tez DAG from a flink "OptimizedPlan". We 
currently have a component that creates a "Job Graph" (Flink's DAG) from an 
OptimizedPlan, it is the last step of the "pre-flight phase" before the job is 
given to the master to be scheduled. We need an equivalent component to create 
a Tez DAG.



> Run Flink on Tez
> ----------------
>
>                 Key: FLINK-972
>                 URL: https://issues.apache.org/jira/browse/FLINK-972
>             Project: Flink
>          Issue Type: New Feature
>          Components: New Components
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>
> The current status is:
>   - A prototype that explores how Tez/Flink classes can interoperate was 
> created by Filip Haase and is at 
> https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto2
>   - There is a version that runs "WordCount" in Tez, using the Flink input 
> formats, output formats, and UDFs.
> Next steps towards generic support for Flink programs are:
> 1) Integrate the Flink Memory Manager with Tez. This means actually defining 
> how much memory of each container Flink may allocate for its internal 
> algorithms. In Flink's core, we allow users to set the amount of memory, or 
> define it relative to the heap size (with 0.7*heap_size) being used if 
> nothing else is specified.
> 2) Create a version of the Flink task context (PactTaskContext) for Tez: This 
> would allow to run the Flink runtime operators on a Tez processor.
> 3) Integrate Flink "ship strategies" (partitioning, replication, 
> redistribution, etc) with the way Tez parameterizes connections.
> 4) Integrate the Flink Sorting/Caching with Tez. This should be simple, if 
> the memory manager is there, these classes should work out of the box.
> 5) Create a component that creates the Tez DAG from a flink "OptimizedPlan". 
> We currently have a component that creates a "Job Graph" (Flink's DAG) from 
> an OptimizedPlan, it is the last step of the "pre-flight phase" before the 
> job is given to the master to be scheduled. We need an equivalent component 
> to create a Tez DAG.
> 6) Create a distribution that uses Tez as distributed runtime. Create a 
> "client" that creates a Tez AM on Yarn and submits the DAG there. Adopt the 
> bash scripts to pick up the Tez and Yarn parameters and set up the client 
> correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (FLINK-972) Run Flink on Tez

Reply via email to