Hi Xiobin, what build of hadoop are you using, and what type of compression is being used?
thanks, 2012/2/7 Xiaobin She <[email protected]> > thank you Bejoy, I will look at that book. > > Thanks again! > > > > 2012/2/7 <[email protected]> > > > ** > > Hi > > AFAIK I don't think it is possible to append into a compressed file. > > > > If you have files in hdfs on a dir and you need to compress the same > (like > > files for an hour) you can use MapReduce to do that by setting > > mapred.output.compress = true and > > mapred.output.compression.codec='theCodecYouPrefer' > > You'd get the blocks compressed in the output dir. > > > > You can use the API to read from standard input like > > -get hadoop conf > > -register the required compression codec > > -write to CompressionOutputStream. > > > > You should get a well detailed explanation on the same from the book > > 'Hadoop - The definitive guide' by Tom White. > > Regards > > Bejoy K S > > > > From handheld, Please excuse typos. > > ------------------------------ > > *From: * Xiaobin She <[email protected]> > > *Date: *Tue, 7 Feb 2012 14:24:01 +0800 > > *To: *<[email protected]>; <[email protected]>; David > > Sinclair<[email protected]> > > *Subject: *Re: Can I write to an compressed file which is located in > hdfs? > > > > hi Bejoy and David, > > > > thank you for you help. > > > > So I can't directly write logs or append logs into an compressed file in > > hdfs, right? > > > > Can I compress an file which is already in hdfs and has not been > > compressed? > > > > If I can , how can I do that? > > > > Thanks! > > > > > > > > 2012/2/6 <[email protected]> > > > >> Hi > >> I agree with David on the point, you can achieve step 1 of my > >> previous response with flume. ie load real time inflow of data in > >> compressed format into hdfs. You can specify a time interval or data > size > >> in flume collector that determines when to flush data on to hdfs. > >> > >> Regards > >> Bejoy K S > >> > >> From handheld, Please excuse typos. > >> > >> -----Original Message----- > >> From: David Sinclair <[email protected]> > >> Date: Mon, 6 Feb 2012 09:06:00 > >> To: <[email protected]> > >> Cc: <[email protected]> > >> Subject: Re: Can I write to an compressed file which is located in hdfs? > >> > >> Hi, > >> > >> You may want to have a look at the Flume project from Cloudera. I use it > >> for writing data into HDFS. > >> > >> https://ccp.cloudera.com/display/SUPPORT/Downloads > >> > >> dave > >> > >> 2012/2/6 Xiaobin She <[email protected]> > >> > >> > hi Bejoy , > >> > > >> > thank you for your reply. > >> > > >> > actually I have set up an test cluster which has one > namenode/jobtracker > >> > and two datanode/tasktracker, and I have make an test on this cluster. > >> > > >> > I fetch the log file of one of our modules from the log collector > >> machines > >> > by rsync, and then I use hive command line tool to load this log file > >> into > >> > the hive warehouse which simply copy the file from the local > >> filesystem to > >> > hdfs. > >> > > >> > And I have run some analysis on these data with hive, all this run > well. > >> > > >> > But now I want to avoid the fetch section which use rsync, and write > the > >> > logs into hdfs files directly from the servers which generate these > >> logs. > >> > > >> > And it seems easy to do this job if the file locate in the hdfs is not > >> > compressed. > >> > > >> > But how to write or append logs to an file that is compressed and > >> located > >> > in hdfs? > >> > > >> > Is this possible? > >> > > >> > Or is this an bad practice? > >> > > >> > Thanks! > >> > > >> > > >> > > >> > 2012/2/6 <[email protected]> > >> > > >> > > Hi > >> > > If you have log files enough to become at least one block size > in > >> an > >> > > hour. You can go ahead as > >> > > - run a scheduled job every hour that compresses the log files for > >> that > >> > > hour and stores them on to hdfs (can use LZO or even Snappy to > >> compress) > >> > > - if your hive does more frequent analysis on this data store it as > >> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > >> > > directory - sub dir structure. Once data is in hdfs issue a Alter > >> Table > >> > Add > >> > > Partition statement on corresponding hive table. > >> > > -in Hive DDL use the appropriate Input format (Hive has some > ApacheLog > >> > > Input Format already) > >> > > > >> > > > >> > > Regards > >> > > Bejoy K S > >> > > > >> > > From handheld, Please excuse typos. > >> > > > >> > > -----Original Message----- > >> > > From: Xiaobin She <[email protected]> > >> > > Date: Mon, 6 Feb 2012 16:41:50 > >> > > To: <[email protected]>; 佘晓彬<[email protected]> > >> > > Reply-To: [email protected] > >> > > Subject: Re: Can I write to an compressed file which is located in > >> hdfs? > >> > > > >> > > sorry, this sentence is wrong, > >> > > > >> > > I can't compress these logs every hour and them put them into hdfs. > >> > > > >> > > it should be > >> > > > >> > > I can compress these logs every hour and them put them into hdfs. > >> > > > >> > > > >> > > > >> > > > >> > > 2012/2/6 Xiaobin She <[email protected]> > >> > > > >> > > > > >> > > > hi all, > >> > > > > >> > > > I'm testing hadoop and hive, and I want to use them in log > analysis. > >> > > > > >> > > > Here I have a question, can I write/append log to an compressed > >> file > >> > > > which is located in hdfs? > >> > > > > >> > > > Our system generate lots of log files every day, I can't compress > >> these > >> > > > logs every hour and them put them into hdfs. > >> > > > > >> > > > But what if I want to write logs into files that was already in > the > >> > hdfs > >> > > > and was compressed? > >> > > > > >> > > > Is these files were not compressed, then this job seems easy, but > >> how > >> > to > >> > > > write or append logs into an compressed log? > >> > > > > >> > > > Can I do that? > >> > > > > >> > > > Can anyone give me some advices or give me some examples? > >> > > > > >> > > > Thank you very much! > >> > > > > >> > > > xiaobin > >> > > > > >> > > > >> > > > >> > > >> > >> > > > -- Adam Brown Enablement Engineer Hortonworks <http://www.hadoopsummit.org/>
