Re: :!

prashant ullegaddi Mon, 03 Aug 2009 00:10:07 -0700

How files are written can be controlled. Maybe you are using
SequenceFileOutputFormat.
You can setOutputFormat() to TextOutputFormat.


I guess, this must solve your problem!

On Mon, Aug 3, 2009 at 12:31 PM, Sugandha Naolekar
<[email protected]>wrote:

> dats fine. But, if I place the data in HDFS and then run map reduce code to
> provide compression, then the data will get compressed in sequence files
> but, even the original data will reside in the memory;thereby leading or
> causing a kind of redundancy of data...
>
> Can u pls suggest me a way out?/
>
> On Mon, Aug 3, 2009 at 12:07 PM, prashant ullegaddi <
> [email protected]> wrote:
>
> > I don't think you will be able to compress some data unless it's on HDFS.
> > What you can do is
> > 1. Manually compress the data on the machine where the data resides.
> Then,
> > copy the same to
> >  HDFS. or
> > 2. Copy the data without compressing to HDFS, then run a job which just
> > emits the data as it reads
> >  in key/value pair. You can set
> > FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class) so
> >  that output gets gzipped.
> >
> > Does that solve your problem?
> >
> > btw you didn't exactly specify your data size (how many TBs).
> >
> > On Mon, Aug 3, 2009 at 11:02 AM, Sugandha Naolekar
> > <[email protected]>wrote:
> >
> > > Yes, You are right. Here goes the details related::
> > >
> > > -> I have a Hadoop cluster of 7 nodes. Now there is this 8th machine,
> > which
> > > is not a part of the hadoop cluster.
> > > -> I want to place the data of that machine into the HDFS. Thus, before
> > > placing it in HDFS, I want to compress it, and then dump in the HDFS.
> > > -> I have 4 datanodes in my cluster. also, data might get extended upto
> > > tera
> > > bytes.
> > > -> Also, i have set thr replication factor as 2.
> > > -> I guess, for compression, I will have to run map reduce...?
> > > right..please
> > > tel me the complete approach that is needed to be followed.
> > >
> > > On Mon, Aug 3, 2009 at 10:48 AM, prashant ullegaddi <
> > > [email protected]> wrote:
> > >
> > > > By "I want to compress the data first and then place it in HDFS", do
> > you
> > > > mean you want to compress the data
> > > > locally and then copy to DFS?
> > > >
> > > > What's the size of your data? What's the capacity of HDFS?
> > > >
> > > > On Mon, Aug 3, 2009 at 10:45 AM, Sugandha Naolekar
> > > > <[email protected]>wrote:
> > > >
> > > > > I want to compress the data first and then place it in HDFS. Again,
> > > while
> > > > > retrieving the same, I want to uncompress it and place on the
> desired
> > > > > destination. Can this be possible. How to get started? Also, I want
> > to
> > > > get
> > > > > started with actual coding part of compression and MAP reduce.
> PLease
> > > > > suggest me aptly...!
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards!
> > > > > Sugandha
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards!
> > > Sugandha
> > >
> >
>
>
>
> --
> Regards!
> Sugandha
>

Re: :!

Reply via email to