[ 
https://issues.apache.org/jira/browse/HADOOP-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547105
 ] 

Hudson commented on HADOOP-2234:
--------------------------------

Integrated in Hadoop-Nightly #318 (See 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/318/])

> [hbase] TableInputFormat erroneously aggregates map values
> ----------------------------------------------------------
>
>                 Key: HADOOP-2234
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2234
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>         Attachments: 2234.patch
>
>
> Edward Yoon reports the following phenomeon:
> Given a table:
> {code}
> [21:38]       <edward__>      row1 a: <aa> b: <bb> a:ca <aa2>
> [21:38]       <edward__>      row2 a: <aa3> b: <bb3>
> [21:38]       <edward__>      row3 a: <aa4> b: <bb4>
> {code}
> This map code:
> {code}
>   public void map(WritableComparable key, Writable value,
>       OutputCollector output, Reporter reporter) throws IOException {
>     if (m_collector.collector == null) {
>       m_collector.collector = output;
>     }
>     HStoreKey hKey = (HStoreKey) key;
>     MapWritable newValue = (MapWritable) value;
>     newValue.put(new Text("row:" + hKey.getRow().toString()), new 
> ImmutableBytesWritable(hKey.getRow().toString().getBytes()));
>  
>     Map<Text, String> log = new HashMap<Text, String>();
>     for(Map.Entry<Writable, Writable> e : newValue.entrySet()) {
>       log.put(e.getKey(), e.getValue()); //abbreviation code.
>     }
>  
>     LOG.info(log);
>     output.collect(hKey, newValue);
>   }
> {code}
> ... produces the following.
> {code}
> 07/11/20 14:07:53 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 07/11/20 14:07:53 WARN mapred.JobClient: No job jar file set.  User classes 
> may not be found. See JobConf(Class) or JobConf#setJar(String).
> 07/11/20 14:07:53 INFO mapred.MapTask: numReduceTasks: 1
> 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa, b:=bb, a:da=aa44, a:ca=aa2}
> 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa3, b:=bb3, a:da=aa44, a:ca=aa2}
> 07/11/20 14:07:53 INFO algebra.SortMap: {a:=aa4, b:=bb4, a:da=aa44, a:ca=aa2}
> 07/11/20 14:07:53 INFO mapred.LocalJobRunner: 
> 07/11/20 14:07:53 INFO mapred.TaskRunner: Task 'map_0000' done.
> 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa, b:=bb, a:da=aa44, a:ca=aa2}
> 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa3, b:=bb3, a:da=aa44, 
> a:ca=aa2}
> 07/11/20 14:07:53 INFO algebra.SortReduce: {a:=aa4, b:=bb4, a:da=aa44, 
> a:ca=aa2}
> 07/11/20 14:07:53 INFO mapred.LocalJobRunner: reduce > reduce
> 07/11/20 14:07:53 INFO mapred.TaskRunner: Task 'reduce_9ji2mr' done.
> {code}
> Notice how content from the first row is present when you output the second 
> and third rows.
> The problem is that in TIF, after calling scanner.next, it copies the 
> scanner.next value to the passed in MapWritable value (converting from 
> TreeMap to MapWritable).  It resets the TreeMap passed to the scanner.next 
> each time but not the passed in MapWritable.
> There is a similar problem in the reduce where the outputter is collecting 
> together values (see log above).  Need to figure whats going on here.  Below 
> is the reduce code:
> {code}
> [22:03]       <edward__>       while (values.hasNext()) {
> [22:03]       <edward__>      MapWritable data = (MapWritable) values.next();
> [22:03]       <edward__>      Map<String, String> log = new HashMap<String, 
> String>();
> [22:03]       <edward__>      for (Map.Entry<Writable, Writable> e : 
> data.entrySet()) {
> [22:03]       <edward__>      log.put(e.getKey().toString(), new 
> String(((ImmutableBytesWritable) e
> [22:03]       <edward__>      .getValue()).get()));
> [22:03]       <edward__>      }
> [22:03]       <edward__>      LOG.info(log);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to