Awesome!
Sent from my mobile. Please excuse the typos.
On 2010-02-27, at 1:52 PM, "Eric Yang" <ey...@yahoo-inc.com> wrote:
There will be a converter from sequence file to other file format.
If a new
file format has been decided to replace sequence file.
Regards,
Eric
On 2/26/10 8:04 PM, "James Seigel" <ja...@tynt.com> wrote:
It sounds like there is some exciting work being done on the demux
process.
I was just wondering if you are planning to be backwards compatible
with 0.3
format for /repos as you move forward .
Cheers
james
On 2010-02-26, at 10:38 AM, Eric Yang wrote:
On 2/26/10 4:43 AM, "Guillermo Pérez" <bi...@tuenti.com> wrote:
One related thing is that I want to modify the "cluster" where we
put
the files, because we will receive syslog data with several types
of
events that we want to store in different clusters to analyze,
backup,
archive separately. I have seen that you can modify the
Record.tagsField and that we use a regexp for extracting the
destination cluster. This is a bit akward, isn't? I don't want to
keep
a tagsField just for that. I'm using a field "event_type" and I
have
modified the extraction/engine/RecordUtil.java, so if that field
exists, "event_" + <event_type> will be used as cluster. This is
the
proper way to go, or there is a better solution for this?.
I don't think you need to modify RecordUtil.java for this
purpose. The
backfill java program is taking first parameter as cluster.
Hence, you
could easily change event_type as the first parameter before you
backfill.
Another question is where I could start looking on how to build
reports and aggregated results of the custom ChukwaRecords I'm
inserting.
There is currently no formal solution to generate report from
ChukwaRecords.
There is org.apache.hadoop.chukwa.dataloader.MetricDataLoader
which loads
ChukwaRecords into mysql database base on mdl.xml file. After
data is
loaded, you could use hicc.sh to start the webserver, and
visualize the data
in Chukwa SQL Client widget. However, I must warn you that
MetricDataLoader
is deprecated, and the future plan to generate report from
ChukwaRecords is
as follow:
Having a post demux data loader which wait to receive new
ChukwaRecords
files, and merge with the existing ChukwaRecords files through a
second MR
job. The second MR job also produces low resolution of the data
for report.
/chukwa/repos/TYPE/DATE <-- Original data goes here.
/chukwa/report/TYPE/[yearly,monthly,weekly,daily] <-- Summarized
JSON data
goes here.
The report JSON will be fixed to 300 data points per series,
optimized for
graphing. I am taking it slow on the actual implementation because
ChukwaRecords should be move to a faster seralization format.
It's another
area that needs to be improved for the future plan to work.
Regards,
Eric
James Seigel
ja...@tynt.com
http://www.tynt.com
Captain Hammer