On Tue, May 24, 2011 at 4:25 PM, Aleksandr Elbakyan <[email protected]> wrote:
> Hello,
>
>  We have the same type of data, we currently convert it to tab delimited file 
> and use it as input for streaming
>

Can you please give more info?
Do you append multiple xml files data as a line into one file? Or
someother way? If so then how big do you let files to be.
how do you create these files assuming your xml is stored somewhere
else in the DB or filesystem? read them one by one?
what are your experiences using text files instead of xml?
Reason why xml files can't be directly used in hadoop or shouldn't be used?
Any performance implications?
Any readings suggested in this area?

Our xml  is something like:

  <column id="Name" security="sensitive" xsi:type="Text">
   <value>free a last</value>
  </column>
  <column id="age" security="no" xsi:type="Text">
   <value>40</value>
  </column>

And we would for eg want to know how many customers above certain age
or certain age with certain income etc.

Sorry for all the questions. I am new and trying to get a grasp and
also learn how would I actually solve our use case.

> Regards,
> Aleksandr
>
> --- On Tue, 5/24/11, Mohit Anchlia <[email protected]> wrote:
>
> From: Mohit Anchlia <[email protected]>
> Subject: Processing xml files
> To: [email protected]
> Date: Tuesday, May 24, 2011, 4:16 PM
>
> I just started learning hadoop and got done with wordcount mapreduce
> example. I also briefly looked at hadoop streaming.
>
> Some questions
> 1) What should  be my first step now? Are there more examples
> somewhere that I can try out?
> 2) Second question is around pracitcal usability using xml files. Our
> xml files are not big they are around 120k in size but hadoop is
> really meant for big files so how do I go about processing these xml
> files?
> 3) Are there any samples or advise on how to processing with xml files?
>
>
> Looking for help and pointers.
>

Reply via email to