thank you Bejoy, I will look at that book. Thanks again!
2012/2/7 <[email protected]> > ** > Hi > AFAIK I don't think it is possible to append into a compressed file. > > If you have files in hdfs on a dir and you need to compress the same (like > files for an hour) you can use MapReduce to do that by setting > mapred.output.compress = true and > mapred.output.compression.codec='theCodecYouPrefer' > You'd get the blocks compressed in the output dir. > > You can use the API to read from standard input like > -get hadoop conf > -register the required compression codec > -write to CompressionOutputStream. > > You should get a well detailed explanation on the same from the book > 'Hadoop - The definitive guide' by Tom White. > Regards > Bejoy K S > > From handheld, Please excuse typos. > ------------------------------ > *From: * Xiaobin She <[email protected]> > *Date: *Tue, 7 Feb 2012 14:24:01 +0800 > *To: *<[email protected]>; <[email protected]>; David > Sinclair<[email protected]> > *Subject: *Re: Can I write to an compressed file which is located in hdfs? > > hi Bejoy and David, > > thank you for you help. > > So I can't directly write logs or append logs into an compressed file in > hdfs, right? > > Can I compress an file which is already in hdfs and has not been > compressed? > > If I can , how can I do that? > > Thanks! > > > > 2012/2/6 <[email protected]> > >> Hi >> I agree with David on the point, you can achieve step 1 of my >> previous response with flume. ie load real time inflow of data in >> compressed format into hdfs. You can specify a time interval or data size >> in flume collector that determines when to flush data on to hdfs. >> >> Regards >> Bejoy K S >> >> From handheld, Please excuse typos. >> >> -----Original Message----- >> From: David Sinclair <[email protected]> >> Date: Mon, 6 Feb 2012 09:06:00 >> To: <[email protected]> >> Cc: <[email protected]> >> Subject: Re: Can I write to an compressed file which is located in hdfs? >> >> Hi, >> >> You may want to have a look at the Flume project from Cloudera. I use it >> for writing data into HDFS. >> >> https://ccp.cloudera.com/display/SUPPORT/Downloads >> >> dave >> >> 2012/2/6 Xiaobin She <[email protected]> >> >> > hi Bejoy , >> > >> > thank you for your reply. >> > >> > actually I have set up an test cluster which has one namenode/jobtracker >> > and two datanode/tasktracker, and I have make an test on this cluster. >> > >> > I fetch the log file of one of our modules from the log collector >> machines >> > by rsync, and then I use hive command line tool to load this log file >> into >> > the hive warehouse which simply copy the file from the local >> filesystem to >> > hdfs. >> > >> > And I have run some analysis on these data with hive, all this run well. >> > >> > But now I want to avoid the fetch section which use rsync, and write the >> > logs into hdfs files directly from the servers which generate these >> logs. >> > >> > And it seems easy to do this job if the file locate in the hdfs is not >> > compressed. >> > >> > But how to write or append logs to an file that is compressed and >> located >> > in hdfs? >> > >> > Is this possible? >> > >> > Or is this an bad practice? >> > >> > Thanks! >> > >> > >> > >> > 2012/2/6 <[email protected]> >> > >> > > Hi >> > > If you have log files enough to become at least one block size in >> an >> > > hour. You can go ahead as >> > > - run a scheduled job every hour that compresses the log files for >> that >> > > hour and stores them on to hdfs (can use LZO or even Snappy to >> compress) >> > > - if your hive does more frequent analysis on this data store it as >> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a >> > > directory - sub dir structure. Once data is in hdfs issue a Alter >> Table >> > Add >> > > Partition statement on corresponding hive table. >> > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog >> > > Input Format already) >> > > >> > > >> > > Regards >> > > Bejoy K S >> > > >> > > From handheld, Please excuse typos. >> > > >> > > -----Original Message----- >> > > From: Xiaobin She <[email protected]> >> > > Date: Mon, 6 Feb 2012 16:41:50 >> > > To: <[email protected]>; 佘晓彬<[email protected]> >> > > Reply-To: [email protected] >> > > Subject: Re: Can I write to an compressed file which is located in >> hdfs? >> > > >> > > sorry, this sentence is wrong, >> > > >> > > I can't compress these logs every hour and them put them into hdfs. >> > > >> > > it should be >> > > >> > > I can compress these logs every hour and them put them into hdfs. >> > > >> > > >> > > >> > > >> > > 2012/2/6 Xiaobin She <[email protected]> >> > > >> > > > >> > > > hi all, >> > > > >> > > > I'm testing hadoop and hive, and I want to use them in log analysis. >> > > > >> > > > Here I have a question, can I write/append log to an compressed >> file >> > > > which is located in hdfs? >> > > > >> > > > Our system generate lots of log files every day, I can't compress >> these >> > > > logs every hour and them put them into hdfs. >> > > > >> > > > But what if I want to write logs into files that was already in the >> > hdfs >> > > > and was compressed? >> > > > >> > > > Is these files were not compressed, then this job seems easy, but >> how >> > to >> > > > write or append logs into an compressed log? >> > > > >> > > > Can I do that? >> > > > >> > > > Can anyone give me some advices or give me some examples? >> > > > >> > > > Thank you very much! >> > > > >> > > > xiaobin >> > > > >> > > >> > > >> > >> >> >
