By "previous files" I meant the job related files there. DataNodes are persistent members in HDFS. A removal of a DN results in loss of blocks. Usually you have replication handling failures of DN flawlessly, but consider a 1-replication cluster. A DN downtime can't be acceptable in that case.
Writes to HDFS is done by writing blocks directly to DN, so a JobClient does need access to it to write its job-related files to HDFS. On Sat, Apr 21, 2012 at 8:33 PM, JAX <jayunit...@gmail.com> wrote: > Thanks j harsh: > I have another question , though --- > > You mentioned that : > > The client needs access to > " the > DataNodes (for actually writing the previous files to DFS for the > JobTracker to pick up)" > > What do you mean by previous files? It seems like, if designing Hadoop from > scratch , I wouldn't want to force the client to communicate with data nodes > at all, since those can be added and removed during a job. > > Jay Vyas > MMSB > UCHC > > On Apr 21, 2012, at 1:14 AM, Harsh J <ha...@cloudera.com> wrote: > >> the >> DataNodes (for actually writing the previous files to DFS for the >> JobTracker to pick up) -- Harsh J