I use the StreamXMLRecordReader out of the streaming contrib package, it works very well. Your key becomes the stanza you are looking for.
On Sat, Oct 31, 2009 at 7:38 AM, Oliver B. Fischer <o.b.fisc...@swe-blog.net > wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello Jeff, > > does it means, that there is no programmatic possibility to define where > a logical file will be splitted independent of the distribution of it > blocks in the HDFS? > > Regards > > Oliver > > Jeff Zhang schrieb: > > Hi Steve, > > > > When you want to read xml, you should provide your custom InputFormat > which > > extends FileInputFormat. > > > > and override the method isSplitable to not split a file , that means one > xml > > file for one mapper. > > > > > > protected boolean isSplitable(FileSystem fs, Path filename) { > > return false; > > } > > > - -- > Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin > Tel. +49 30 44793251, Mobil: +49 178 7903538 > Mail: o.b.fisc...@swe-blog.net Blog: http://www.swe-blog.net > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJK7EwBAAoJELeiwuwqd1DGO/wIAJl8wwf6uNgm/ZwsGh8M1xvz > wSEH9sD2cfjUSV3rmpHndKEfSTEOeHvvaJmJn24K9HhB9w8QyDogAgHawCdBY2TE > K27n4wqSGlbLpQz4XmKUOVtFSooeEPUT58Jn2aMAno+nrWHM7oq9tuCJAAYkBexV > wCrc7eE+o55TlAlx+LDWWS9mJrdTNBYqzoHh0gnWsEGm98CWvzn08tNA/L2moJbQ > HZwnWzfgEBKBwAZUOYLFt2GigIYN3GE0pMp33BgjWi91zPwGSk7Bcq7XhObLK7o/ > uYxS+s3BTkLy+R6ngjOW1NLvg6STX37FpFNZowDmPt8Bzd8GxAefnqcxkVcnb90= > =t6vV > -----END PGP SIGNATURE----- > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals