I think you want to send this to the hadoop mailing list. But, I do
have to some code here that I use to parse the wikipedia dump that I
can share:

public static void main(String[] args) throws Exception {
    ...
    // Stream stuff
    conf.setInputFormat(StreamInputFormat.class);
    StreamInputFormat.setInputPaths(conf, new Path("xml"));
    conf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
    conf.set("stream.recordreader.begin", "<page>");
    conf.set("stream.recordreader.end", "</page>");
    ...
}

In this case the dump is in my xml folder and each entry is defined
between <page> and </page>. Make sure the streaming jar is in your
classpath (it's in hadoop_dir/contrib/streaming).

J-D

On Thu, Aug 27, 2009 at 5:09 PM, llpind<[email protected]> wrote:
>
> Can someone please point me to a XML input format example.  I'm using .20
> code.  Thanks
> --
> View this message in context: 
> http://www.nabble.com/XML-Input--tp25179786p25179786.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
  • XML Input? llpind
    • Re: XML Input? Jean-Daniel Cryans

Reply via email to