Re: tar or hadoop archive

Manhee Jo Wed, 06 Jul 2011 18:52:50 -0700

do you know how to set the number of map/reduce tasks rather than 1 duringhadoop archiving?i've tried -Dmapred.map.tasks=2 (we are using 0.19.2 actually :( ) but invain.


thanks,
manhee

----- Original Message -----From: "Joey Echeverria" <[email protected]>

To: <[email protected]>
Sent: Tuesday, June 28, 2011 8:46 AM
Subject: Re: tar or hadoop archive

Yes, you can see a picture describing HAR files in this old blog post:

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

-Joey

On Mon, Jun 27, 2011 at 4:36 PM, Rita <[email protected]> wrote:

So, it does an index of the file?

On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria <[email protected]>wrote:

The advantage of a hadoop archive files is it lets you access the
files stored in it directly. For example, if you archived three files
(a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
of the three files using the hadoop command line:

hadoop fs -cat har:///user/joey/out/foo.har/a.txt

You can also copy files out of the archive or use files in the archive
as input to map reduce jobs.

-Joey

On Mon, Jun 27, 2011 at 3:06 AM, Rita <[email protected]> wrote:

> We use hadoop/hdfs to archive data. I archive a lot of file by> creating

one
> large tar file and then placing to hdfs. Is it better to use hadoop
archive
> for this or is it essentially the same thing?
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



--
Joseph Echeverria
Cloudera, Inc.
443.305.9434




--
--- Get your facts first, then you can distort them as you please.--




--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: tar or hadoop archive

Reply via email to