One principle is that the input file must be a sequence of pairs, and you
must have a input formatter for the input file. otherwise you cannot use
mapreduce directly.

for your case, it seems that the input file is not consist of a sequence of
pairs, so may not be suitable for MapReduce.

On 9/27/06, howard chen <[EMAIL PROTECTED]> wrote:

On 9/25/06, Feng Jiang <[EMAIL PROTECTED]> wrote:
> mapreduce doesn't know anything about your application logic. as long as
you
> can split the big xml into a lot of small xml files, then hadoop could
help
> you.
>
> 1. split this big xml file into 10000 small xml files, for example.
> 2. each small xml file could be one pair in sequence file.
> 3. then use mapreduce to read the sequence file and parse them, for
example
> you have 10 map & reduce tasks.
> 4. finally you have 10 output files, which contain the format you want.
>


Hello,

in my example, XML paring to CSV seems to be one-to-one mapping, e.g.

<book>
    <title>hadoop</title>
    <author>peter</author>
    <ISBN>121332</ISBN>
</book>

would become (CSV)
hadoop,peter,121332

use mapreduce seems not suitable?

thanks.

Reply via email to