One principle is that the input file must be a sequence of pairs, and you must have a input formatter for the input file. otherwise you cannot use mapreduce directly.
for your case, it seems that the input file is not consist of a sequence of pairs, so may not be suitable for MapReduce. On 9/27/06, howard chen <[EMAIL PROTECTED]> wrote:
On 9/25/06, Feng Jiang <[EMAIL PROTECTED]> wrote: > mapreduce doesn't know anything about your application logic. as long as you > can split the big xml into a lot of small xml files, then hadoop could help > you. > > 1. split this big xml file into 10000 small xml files, for example. > 2. each small xml file could be one pair in sequence file. > 3. then use mapreduce to read the sequence file and parse them, for example > you have 10 map & reduce tasks. > 4. finally you have 10 output files, which contain the format you want. > Hello, in my example, XML paring to CSV seems to be one-to-one mapping, e.g. <book> <title>hadoop</title> <author>peter</author> <ISBN>121332</ISBN> </book> would become (CSV) hadoop,peter,121332 use mapreduce seems not suitable? thanks.