Thank you Jason!

On Wed, Jul 1, 2009 at 5:26 PM, jason hadoop <[email protected]> wrote:

> The directory returned by getWorkOutputPath is a task specific directory,
> to
> be used for files that should be part of the final output of the job.
>
> If you want to write to the task local directory, use the local file system
> api, and paths relative to '.'.
> The parameter mapred.local.dir will contain the name of the local
> directory.
>
>
> On Wed, Jul 1, 2009 at 9:19 AM, bonito perdo <[email protected]
> >wrote:
>
> > Thank you for you immediate response.
> > In this case, what is the difference with the path obtained from
> > FileOutputFormat.getWorkOutputPath(job)? this path refers to hdfs...
> >
> > Thank you.
> >
> >
> > On Wed, Jul 1, 2009 at 5:13 PM, jason hadoop <[email protected]>
> > wrote:
> >
> > > The parameter mapred.local.dir controls the directory used by the task
> > > tracker for map/reduce jobs local files.
> > >
> > > the dfs.data.dir paramter is for the datanode.
> > >
> > > On Wed, Jul 1, 2009 at 8:56 AM, bonito <[email protected]> wrote:
> > >
> > > >
> > > > Hello,
> > > > I am a bit confused about the local directories where each map/reduce
> > > task
> > > > can store data.
> > > > According to what I have read,
> > > > dfs.data.dir - is the path on the local file system in which the
> > DataNode
> > > > instance should store its data. That is, since we have a number of
> > > > individual nodes, this is the place where each node can store its own
> > > data.
> > > > Right?
> > > > This data may be part of a-let's say- file stored under the hdfs
> > > namespace?
> > > > The value of this property for my configuration is:
> > > >                          /home/bon/my_hdfiles/temp_0.19.1/dfs/data.
> > > > As far as I can understand this path refers to the local "disk" of
> each
> > > > node.
> > > >
> > > > Moreover, calling FileOutputFormat.getWorkOutputPath(job) we obtain
> the
> > > > Path
> > > > to the task's temporary output directory for the map-reduce job. This
> > > path
> > > > is totally different than the previous which confuses me since the
> > > > temporary
> > > > output of each task should be written locally in the node's disk. The
> > > path
> > > > I
> > > > retrieve is:
> > > >
> > > >
> > > >
> > >
> >
> hdfs://localhost:9000/user/bon/keys_fil.txt/_temporary/_attempt_200907011515_0009_m_000000_0
> > > > Does this path refer to the local disk (node)? Or is it possible that
> > it
> > > > may
> > > > refer to another node in the cluster?
> > > >
> > > > Any clarification would be of great help.
> > > >
> > > > Thank you.
> > > > --
> > > > View this message in context:
> > > > http://www.nabble.com/local-directory-tp24292289p24292289.html
> > > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > > >
> > > >
> > >
> > >
> > > --
> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Reply via email to