It will work as long as you consider the xml tag boundary in your RecordReader.
On Tue, Nov 22, 2011 at 9:20 AM, hari708 <[email protected]> wrote: > > Hi, > I have a big file consisting of XML data.the XML is not represented as a > single line in the file. if we stream this file using ./hadoop dfs -put > command to a hadoop directory .How the distribution happens.? > Basically in My mapreduce program i am expecting a complete XML as my > input.i have a CustomReader(for XML) in my mapreduce job configuration.My > main confusion is if namenode distribute data to DataNodes ,there is a > chance that a part of xml can go to one data node and other half can go in > another datanode.If that is the case will my custom XMLReader in the > mapreduce be able to combine it(as mapreduce reads data locally only). > Please help me on this? > -- > View this message in context: > http://old.nabble.com/hadoop-File-loading-tp32871902p32871902.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Best Regards Jeff Zhang
