We ended up subclassing TextInputFormat and adding a custom RecordReader that starts and ends record reads on tags. The
StreamXmlRecordReader class is a good reference for this.



Prasan Ary wrote:
Hi All,
  I am writing a java implementation for my map/reduce function on hadoop.
  Input to this is a xml file, and the map function has to process a well 
formed xml records. So far I have been unable to split the xml file at xml 
record boundary to feed into my map function.
  Can anybody point me to resources where forcing file split at desired 
boundary is explained ?
thx,
  Pra.

---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

Reply via email to