At 2010-08-25 03:39:11,"hdev ml" <[email protected]> wrote:
HI all,
This question is related partly to hadoop and partly to chukwa.
We have huge number of logged information sitting in one machine. I am not sure
whether the storage is in multiple files or in a database.
But what we want to do is get that log information, transform it and store it
into the some database for data mining/ data warehousing/ reporting purposes.
1. Since it is on one machine, is Chukwa the right kind of frame work to do
this ETL process?
2. I understand that generally Hadoop works on large files. But assuming that
the data sits in a database, what if we somehow partition data for
Hadoop/Chukwa? Is that the right strategy?
Any help will be appreciated.
Thanks,
Harshad