The only real problem with xml and map-reduce is if you are talking about one gargantuan XML file. That makes correct splitting difficult.
If you are talking about millions or billions of small xml files (stored in some sort of container file), then hadoop should be pretty easy to use. On 4/28/08 9:39 AM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: > Hello > > Has anyone had any experience with processing xml files within Hadoop within > their maps/reduces? > In particular, has anyone used any sort of XQuery/XPath processing within > their maps/reduces? > Say I have XML string passed to the map and now I want to find something in > particular via XQuery/XPath or some sort to run numbers on occurrences or > parse out a particular section within the XML. > > Anyone done any XML processing looking for things within XML? Then, aggregate > common pieces together in the reduces ? > > > On another note, > Has anyone figured out splits for XML files? > Has anyone written a custom XML reader other than the StreamXmlRecordReader? > The only one I've read about and can find anything is: > http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html > > > Thanks. > > > > > ______________________________________________________________________________ > ______ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
