Hi, Actually, I don't know there exists any well-made XML InputFormat or Record reader. To the best of my knowledge, StreamXmlRecordReader ( http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamXmlRecordReader.html ) of Hadoop streaming is only solution.
Good luck! -- Hyunsik Choi Database & Information Systems Group, Korea University http://diveintodata.org On Thu, Jul 30, 2009 at 5:30 PM, Wasim Bari<wasimb...@msn.com> wrote: > > > > Hi All, > > I am looking to store some real big xml files in HDFS and then process > them using MapReduce. > > > > Do we have some utility which uploads the xml files to hdfs making sure split > up of file in block doen't brake an elemet ( mean half element on one block > and half on someother ) ? > > > > Any suggestions to work thos out will be appreciated greatly. > > > > Thanks > > > > Bari >