Hi
If you have log files enough to become at least one block size in an hour.
You can go ahead as
- run a scheduled job every hour that compresses the log files for that hour
and stores them on to hdfs (can use LZO or even Snappy to compress)
- if your hive does more frequent analysis on this data store it as PARTITIONED
BY (Date,Hour) . While loading into hdfs also follow a directory - sub dir
structure. Once data is in hdfs issue a Alter Table Add Partition statement on
corresponding hive table.
-in Hive DDL use the appropriate Input format (Hive has some ApacheLog Input
Format already)
Regards
Bejoy K S
From handheld, Please excuse typos.
-----Original Message-----
From: Xiaobin She <[email protected]>
Date: Mon, 6 Feb 2012 16:41:50
To: <[email protected]>; 佘晓彬<[email protected]>
Reply-To: [email protected]
Subject: Re: Can I write to an compressed file which is located in hdfs?
sorry, this sentence is wrong,
I can't compress these logs every hour and them put them into hdfs.
it should be
I can compress these logs every hour and them put them into hdfs.
2012/2/6 Xiaobin She <[email protected]>
>
> hi all,
>
> I'm testing hadoop and hive, and I want to use them in log analysis.
>
> Here I have a question, can I write/append log to an compressed file
> which is located in hdfs?
>
> Our system generate lots of log files every day, I can't compress these
> logs every hour and them put them into hdfs.
>
> But what if I want to write logs into files that was already in the hdfs
> and was compressed?
>
> Is these files were not compressed, then this job seems easy, but how to
> write or append logs into an compressed log?
>
> Can I do that?
>
> Can anyone give me some advices or give me some examples?
>
> Thank you very much!
>
> xiaobin
>