Hi Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed files ( gzip, in this case ) and converts them to text.
You can tweak the code to do the reverse http://pastebin.com/mBHVHtrm Raj >________________________________ > From: Xiaobin She <[email protected]> >To: [email protected] >Cc: [email protected]; David Sinclair ><[email protected]> >Sent: Tuesday, February 7, 2012 1:11 AM >Subject: Re: Can I write to an compressed file which is located in hdfs? > >thank you Bejoy, I will look at that book. > >Thanks again! > > > >2012/2/7 <[email protected]> > >> ** >> Hi >> AFAIK I don't think it is possible to append into a compressed file. >> >> If you have files in hdfs on a dir and you need to compress the same (like >> files for an hour) you can use MapReduce to do that by setting >> mapred.output.compress = true and >> mapred.output.compression.codec='theCodecYouPrefer' >> You'd get the blocks compressed in the output dir. >> >> You can use the API to read from standard input like >> -get hadoop conf >> -register the required compression codec >> -write to CompressionOutputStream. >> >> You should get a well detailed explanation on the same from the book >> 'Hadoop - The definitive guide' by Tom White. >> Regards >> Bejoy K S >> >> From handheld, Please excuse typos. >> ------------------------------ >> *From: * Xiaobin She <[email protected]> >> *Date: *Tue, 7 Feb 2012 14:24:01 +0800 >> *To: *<[email protected]>; <[email protected]>; David >> Sinclair<[email protected]> >> *Subject: *Re: Can I write to an compressed file which is located in hdfs? >> >> hi Bejoy and David, >> >> thank you for you help. >> >> So I can't directly write logs or append logs into an compressed file in >> hdfs, right? >> >> Can I compress an file which is already in hdfs and has not been >> compressed? >> >> If I can , how can I do that? >> >> Thanks! >> >> >> >> 2012/2/6 <[email protected]> >> >>> Hi >>> I agree with David on the point, you can achieve step 1 of my >>> previous response with flume. ie load real time inflow of data in >>> compressed format into hdfs. You can specify a time interval or data size >>> in flume collector that determines when to flush data on to hdfs. >>> >>> Regards >>> Bejoy K S >>> >>> From handheld, Please excuse typos. >>> >>> -----Original Message----- >>> From: David Sinclair <[email protected]> >>> Date: Mon, 6 Feb 2012 09:06:00 >>> To: <[email protected]> >>> Cc: <[email protected]> >>> Subject: Re: Can I write to an compressed file which is located in hdfs? >>> >>> Hi, >>> >>> You may want to have a look at the Flume project from Cloudera. I use it >>> for writing data into HDFS. >>> >>> https://ccp.cloudera.com/display/SUPPORT/Downloads >>> >>> dave >>> >>> 2012/2/6 Xiaobin She <[email protected]> >>> >>> > hi Bejoy , >>> > >>> > thank you for your reply. >>> > >>> > actually I have set up an test cluster which has one namenode/jobtracker >>> > and two datanode/tasktracker, and I have make an test on this cluster. >>> > >>> > I fetch the log file of one of our modules from the log collector >>> machines >>> > by rsync, and then I use hive command line tool to load this log file >>> into >>> > the hive warehouse which simply copy the file from the local >>> filesystem to >>> > hdfs. >>> > >>> > And I have run some analysis on these data with hive, all this run well. >>> > >>> > But now I want to avoid the fetch section which use rsync, and write the >>> > logs into hdfs files directly from the servers which generate these >>> logs. >>> > >>> > And it seems easy to do this job if the file locate in the hdfs is not >>> > compressed. >>> > >>> > But how to write or append logs to an file that is compressed and >>> located >>> > in hdfs? >>> > >>> > Is this possible? >>> > >>> > Or is this an bad practice? >>> > >>> > Thanks! >>> > >>> > >>> > >>> > 2012/2/6 <[email protected]> >>> > >>> > > Hi >>> > > If you have log files enough to become at least one block size in >>> an >>> > > hour. You can go ahead as >>> > > - run a scheduled job every hour that compresses the log files for >>> that >>> > > hour and stores them on to hdfs (can use LZO or even Snappy to >>> compress) >>> > > - if your hive does more frequent analysis on this data store it as >>> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a >>> > > directory - sub dir structure. Once data is in hdfs issue a Alter >>> Table >>> > Add >>> > > Partition statement on corresponding hive table. >>> > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog >>> > > Input Format already) >>> > > >>> > > >>> > > Regards >>> > > Bejoy K S >>> > > >>> > > From handheld, Please excuse typos. >>> > > >>> > > -----Original Message----- >>> > > From: Xiaobin She <[email protected]> >>> > > Date: Mon, 6 Feb 2012 16:41:50 >>> > > To: <[email protected]>; 佘晓彬<[email protected]> >>> > > Reply-To: [email protected] >>> > > Subject: Re: Can I write to an compressed file which is located in >>> hdfs? >>> > > >>> > > sorry, this sentence is wrong, >>> > > >>> > > I can't compress these logs every hour and them put them into hdfs. >>> > > >>> > > it should be >>> > > >>> > > I can compress these logs every hour and them put them into hdfs. >>> > > >>> > > >>> > > >>> > > >>> > > 2012/2/6 Xiaobin She <[email protected]> >>> > > >>> > > > >>> > > > hi all, >>> > > > >>> > > > I'm testing hadoop and hive, and I want to use them in log analysis. >>> > > > >>> > > > Here I have a question, can I write/append log to an compressed >>> file >>> > > > which is located in hdfs? >>> > > > >>> > > > Our system generate lots of log files every day, I can't compress >>> these >>> > > > logs every hour and them put them into hdfs. >>> > > > >>> > > > But what if I want to write logs into files that was already in the >>> > hdfs >>> > > > and was compressed? >>> > > > >>> > > > Is these files were not compressed, then this job seems easy, but >>> how >>> > to >>> > > > write or append logs into an compressed log? >>> > > > >>> > > > Can I do that? >>> > > > >>> > > > Can anyone give me some advices or give me some examples? >>> > > > >>> > > > Thank you very much! >>> > > > >>> > > > xiaobin >>> > > > >>> > > >>> > > >>> > >>> >>> >> > > >
