By "previous files" I meant the job related files there. DataNodes are
persistent members in HDFS. A removal of a DN results in loss of
blocks. Usually you have replication handling failures of DN
flawlessly, but consider a 1-replication cluster. A DN downtime can't
be acceptable in that case.

Writes to HDFS is done by writing blocks directly to DN, so a
JobClient does need access to it to write its job-related files to
HDFS.

On Sat, Apr 21, 2012 at 8:33 PM, JAX <jayunit...@gmail.com> wrote:
> Thanks j harsh:
> I have another question , though ---
>
> You mentioned that :
>
> The client needs access to
> " the
> DataNodes (for actually writing the previous files to DFS for the
> JobTracker to pick up)"
>
> What do you mean by previous files? It seems like, if designing Hadoop from 
> scratch , I wouldn't want to force the client to communicate with data nodes 
> at all, since those can be added and removed during a job.
>
> Jay Vyas
> MMSB
> UCHC
>
> On Apr 21, 2012, at 1:14 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> the
>> DataNodes (for actually writing the previous files to DFS for the
>> JobTracker to pick up)



-- 
Harsh J

Reply via email to