There's a StreamXmlRecordReader class in contrib/streaming that looks like it will chunk up an xml file based on xml tags. I haven't used it myself ..
-----Original Message----- From: Prasan Ary [mailto:[EMAIL PROTECTED] Sent: Monday, March 03, 2008 3:30 PM To: [email protected] Subject: map/reduce function on xml string Hi All, I am writing a java implementation for my map/reduce function on hadoop. Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable to split the xml file at xml record boundary to feed into my map function. Can anybody point me to resources where forcing file split at desired boundary is explained ? thx, Pra. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
