[ 
https://issues.apache.org/jira/browse/FLINK-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259110#comment-14259110
 ] 

Fabian Hueske commented on FLINK-1341:
--------------------------------------

Data in HDFS is usually replicated, i.e., each piece of data is stored multiple 
times on different nodes in the cluster (default is 3). Flink tries to assign 
read operations to local nodes. If some piece of data cannot be locally read by 
any node, it is remotely read, i.e., the data is sent over the network. This is 
similiar for all systems that read data from HDFS such as Hadoop and Spark.

> In Hadoop cluster mode, Worker node has data but not TaskManager
> ----------------------------------------------------------------
>
>                 Key: FLINK-1341
>                 URL: https://issues.apache.org/jira/browse/FLINK-1341
>             Project: Flink
>          Issue Type: Wish
>          Components: JobManager, TaskManager
>         Environment: Hadoop 2.0,  YARN cluster
>            Reporter: Solaimurugan.V
>
> in my Hadoop 2.0 cluster setup, which has 12 node, 11 for Workers. I'm trying 
> to setup Flink cluster mode, on top up exsiting setup. I have decided to have 
> 5 TaskManager out of 11 Workers. what happens if I'm submitting Flink job, 
> data may exist in node 5 but  TaskManager not runnig on node 5. .??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to