Hello, We have the same type of data, we currently convert it to tab delimited file and use it as input for streaming
Regards, Aleksandr --- On Tue, 5/24/11, Mohit Anchlia <[email protected]> wrote: From: Mohit Anchlia <[email protected]> Subject: Processing xml files To: [email protected] Date: Tuesday, May 24, 2011, 4:16 PM I just started learning hadoop and got done with wordcount mapreduce example. I also briefly looked at hadoop streaming. Some questions 1) What should be my first step now? Are there more examples somewhere that I can try out? 2) Second question is around pracitcal usability using xml files. Our xml files are not big they are around 120k in size but hadoop is really meant for big files so how do I go about processing these xml files? 3) Are there any samples or advise on how to processing with xml files? Looking for help and pointers.
