Hi, Chris,
Thanks for adding the map side join feature (http://issues.apache.org/jira/browse/HADOOP-2085) I tried the join example with KeyValueTextInputFormat as input format, but got following exception: java.lang.NullPointerException at org.apache.hadoop.mapred.KeyValueTextInputFormat.isSplitable(KeyValueTextInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:185) at org.apache.hadoop.mapred.join.Parser$WNode.getSplits(Parser.java:304) at org.apache.hadoop.mapred.join.Parser$CNode.getSplits(Parser.java:374) at org.apache.hadoop.mapred.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:129) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:542) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:803) at org.apache.hadoop.examples.Join.run(Join.java:169) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Join.main(Join.java:178) The exception happened because hadoop.mapred.join.Parser instantiate InputFormat class without JobConf, while KeyValueTextInputFormat need its configure method to be called with proper JobConf. public void parse(List<Token> ll) throws IOException { StringBuilder sb = new StringBuilder(); Iterator<Token> i = ll.iterator(); while (i.hasNext()) { Token t = i.next(); if (TType.COMMA.equals(t.getType())) { try { inf = (InputFormat)ReflectionUtils.newInstance( Class.forName(sb.toString()).asSubclass(InputFormat.class), null); ß missing JobConf As a workaround, I added "setConf" in hadoop.mapred.join.Parser's getSplits method, then the NullPointerException is gone and join works as expected. I am not sure if this is a clean fix, ideally, I'd like to pass the JobConf object in parse method when InputFormat is instantiated... public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException { ReflectionUtils.setConf (inf, job); ß my workaround return inf.getSplits(getConf(job), numSplits); } Many InputFormat subclasses may need their configure method to be called, can you look into this issue to see if it is a valid bug? Thanks. Haijun