I think you want to send this to the hadoop mailing list. But, I do
have to some code here that I use to parse the wikipedia dump that I
can share:
public static void main(String[] args) throws Exception {
...
// Stream stuff
conf.setInputFormat(StreamInputFormat.class);
StreamInputFormat.setInputPaths(conf, new Path("xml"));
conf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
conf.set("stream.recordreader.begin", "<page>");
conf.set("stream.recordreader.end", "</page>");
...
}
In this case the dump is in my xml folder and each entry is defined
between <page> and </page>. Make sure the streaming jar is in your
classpath (it's in hadoop_dir/contrib/streaming).
J-D
On Thu, Aug 27, 2009 at 5:09 PM, llpind<[email protected]> wrote:
>
> Can someone please point me to a XML input format example. I'm using .20
> code. Thanks
> --
> View this message in context:
> http://www.nabble.com/XML-Input--tp25179786p25179786.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>