Re: Processing xml documents using StreamXmlRecordReader

Mohammad Tariq Tue, 19 Jun 2012 04:21:03 -0700

My driver function looks like this -

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
                // TODO Auto-generated method stub


                Configuration conf = new Configuration();
                Job job = new Job();
                conf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
                conf.set("stream.recordreader.begin", "<info>");
                conf.set("stream.recordreader.end", "</info>");
                job.setInputFormatClass(StreamInputFormat.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(IntWritable.class);
                FileInputFormat.addInputPath(job, new Path("/mapin/demo.xml"));
                FileOutputFormat.setOutputPath(job, new Path("/mapout/demo"));
                job.waitForCompletion(true);
        }

Could you please out my mistake??

Regards,
    Mohammad Tariq


On Tue, Jun 19, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com> wrote:
> Hello Madhu,
>
>             Thanks for the response. Actually I was trying to use the
> new API (Job). Have you tried that. I was not able to set the
> InputFormat using the Job API.
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <phatak....@gmail.com> wrote:
>> Hi,
>>  Set the following properties in driver class
>>
>>   jobConf.set("stream.recordreader.class",
>> "org.apache.hadoop.streaming.StreamXmlRecordReader");
>> jobConf.set("stream.recordreader.begin",
>> "start-tag");
>> jobConf.set("stream.recordreader.end",
>> "end-tag");
>>                         jobConf.setInputFormat(StreamInputFormat,class);
>>
>>  In Mapper, xml record will come as key of type Text,so your mapper will
>> look like
>>
>>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
>>
>>
>> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <donta...@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>>        Could anyone, who has written MapReduce jobs to process xml
>>> documents stored in there cluster using "StreamXmlRecordReader" share
>>> his/her experience??...or if you can provide me some pointers
>>> addressing that..Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>
>>
>>
>>
>> --
>> https://github.com/zinnia-phatak-dev/Nectar
>>

Re: Processing xml documents using StreamXmlRecordReader

Reply via email to