[jira] Commented: (HBASE-1385) Revamp TableInputFormat, needs updating to match hadoop 0.20.x AND remove bit where we can make < maps than regions

Lars George (JIRA) Sun, 28 Jun 2009 12:17:13 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724984#action_12724984
 ]


Lars George commented on HBASE-1385:
------------------------------------

Re: the error I get for the test, here is what I see:

{code}
java.lang.NullPointerException
        at 
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:899)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
        at 
org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:142)
        at 
org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:121)
{code}

I traced this down and found this in getSerializer():

{code}
if (serialization.accept(c)) {
  return (Serialization<T>) serialization;
}
{code}

which does this in the default WritableSerialization class:

{code}
public boolean accept(Class<?> c) {
  return Writable.class.isAssignableFrom(c);
}
{code}

So, this means that the class handed in must be serializable. Which makes sense 
given the class name. Now looking into where it is called, I see this in 
JobClient:

{code}
  T[] array = (T[]) splits.toArray(new 
org.apache.hadoop.mapreduce.InputSplit[splits.size()]);
  ...
  SerializationFactory factory = new SerializationFactory(conf);
  Serializer<T> serializer = factory.getSerializer((Class<T>) 
array[0].getClass());
  ...
{code}
So InputSplit *must* be implementing Writable! Looking at the old (mind you!) 
InputSplit:

{code}
@Deprecated
public interface InputSplit extends Writable {
...
{code}

which is fine, but the new one in mapreduce does this:

{code}
public abstract class InputSplit {
...
{code}

and that's that. Broken! So I can either add it myself on a higher level and 
hope for the best, or... ? Suggestions?

> Revamp TableInputFormat, needs updating to match hadoop 0.20.x AND remove bit 
> where we can make < maps than regions
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1385
>                 URL: https://issues.apache.org/jira/browse/HBASE-1385
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.21.0
>
>         Attachments: 1385-v2.patch, 1385-v3.patch, 1385-v4.patch, 1385.patch, 
> mr.patch
>
>
> Update TIF to match new MR.
> Remove the bit of logic where we will use number of configured maps as splits 
> count rather than regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1385) Revamp TableInputFormat, needs updating to match hadoop 0.20.x AND remove bit where we can make < maps than regions

Reply via email to