How files are written can be controlled. Maybe you are using SequenceFileOutputFormat. You can setOutputFormat() to TextOutputFormat.
I guess, this must solve your problem! On Mon, Aug 3, 2009 at 12:31 PM, Sugandha Naolekar <[email protected]>wrote: > dats fine. But, if I place the data in HDFS and then run map reduce code to > provide compression, then the data will get compressed in sequence files > but, even the original data will reside in the memory;thereby leading or > causing a kind of redundancy of data... > > Can u pls suggest me a way out?/ > > On Mon, Aug 3, 2009 at 12:07 PM, prashant ullegaddi < > [email protected]> wrote: > > > I don't think you will be able to compress some data unless it's on HDFS. > > What you can do is > > 1. Manually compress the data on the machine where the data resides. > Then, > > copy the same to > > HDFS. or > > 2. Copy the data without compressing to HDFS, then run a job which just > > emits the data as it reads > > in key/value pair. You can set > > FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class) so > > that output gets gzipped. > > > > Does that solve your problem? > > > > btw you didn't exactly specify your data size (how many TBs). > > > > On Mon, Aug 3, 2009 at 11:02 AM, Sugandha Naolekar > > <[email protected]>wrote: > > > > > Yes, You are right. Here goes the details related:: > > > > > > -> I have a Hadoop cluster of 7 nodes. Now there is this 8th machine, > > which > > > is not a part of the hadoop cluster. > > > -> I want to place the data of that machine into the HDFS. Thus, before > > > placing it in HDFS, I want to compress it, and then dump in the HDFS. > > > -> I have 4 datanodes in my cluster. also, data might get extended upto > > > tera > > > bytes. > > > -> Also, i have set thr replication factor as 2. > > > -> I guess, for compression, I will have to run map reduce...? > > > right..please > > > tel me the complete approach that is needed to be followed. > > > > > > On Mon, Aug 3, 2009 at 10:48 AM, prashant ullegaddi < > > > [email protected]> wrote: > > > > > > > By "I want to compress the data first and then place it in HDFS", do > > you > > > > mean you want to compress the data > > > > locally and then copy to DFS? > > > > > > > > What's the size of your data? What's the capacity of HDFS? > > > > > > > > On Mon, Aug 3, 2009 at 10:45 AM, Sugandha Naolekar > > > > <[email protected]>wrote: > > > > > > > > > I want to compress the data first and then place it in HDFS. Again, > > > while > > > > > retrieving the same, I want to uncompress it and place on the > desired > > > > > destination. Can this be possible. How to get started? Also, I want > > to > > > > get > > > > > started with actual coding part of compression and MAP reduce. > PLease > > > > > suggest me aptly...! > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards! > > > > > Sugandha > > > > > > > > > > > > > > > > > > > > > -- > > > Regards! > > > Sugandha > > > > > > > > > -- > Regards! > Sugandha >
