Hi, Chris,

 

Thanks for adding the map side join feature 
(http://issues.apache.org/jira/browse/HADOOP-2085)

 

I tried the join example with KeyValueTextInputFormat as input format, but got 
following exception:

 

 

java.lang.NullPointerException

        at 
org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTextInputFormat.java:44)

        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:185)

        at org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:304)

        at org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:374)

        at 
org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:129)

        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:542)

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:803)

        at org.apache.hadoop.examples.Join.run(Join.java:169)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.examples.Join.main(Join.java:178)

 

 

The exception happened because hadoop.mapred.join.Parser instantiate 
InputFormat class without JobConf, while KeyValueTextInputFormat need its 
configure method to be called with proper JobConf.

 

    public void parse(List<Token> ll) throws IOException {

      StringBuilder sb = new StringBuilder();

      Iterator<Token> i = ll.iterator();

      while (i.hasNext()) {

        Token t = i.next();

        if (TType.COMMA.equals(t.getType())) {

          try {

            inf = (InputFormat)ReflectionUtils.newInstance(

                Class.forName(sb.toString()).asSubclass(InputFormat.class),

                null);     ß missing JobConf

 

As a workaround, I added "setConf" in hadoop.mapred.join.Parser's getSplits 
method, then the NullPointerException is gone and join works as expected. 

 

I am not sure if this is a clean fix, ideally, I'd like to pass the JobConf 
object in parse method when InputFormat is instantiated...

 

    public InputSplit[] getSplits(JobConf job, int numSplits)

        throws IOException {

       ReflectionUtils.setConf (inf, job);   ß my workaround

      return inf.getSplits(getConf(job), numSplits);

    }

 

Many InputFormat subclasses may need their configure method to be called, can 
you look into this issue to see if it is a valid bug? Thanks.

 

Haijun

Reply via email to