[ 
https://issues.apache.org/jira/browse/HBASE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584697#action_12584697
 ] 

stack commented on HBASE-521:
-----------------------------

Comments on v5

Your entry in CHANGES.txt belongs in the INCOMPATIBLE section

You were going to deprecate obtainScanner and make a getScanner instead in 
HTable?

Can you leave one of these in place?

{code}
-      LOG.debug("Advancing internal scanner to startKey " + localStartKey);^M
       this.currentRegionLocation = getRegionLocation(localStartKey);^M
-      ^M
-      LOG.debug("New region: " + this.currentRegionLocation);^
{code}

I know they look useful but I've found them useful debugging trying to figure 
problematic region.... WIth them in place I know the last or current 
problematic region when scanning.

Can you write the below...

{code}
       if (values != null && values.size() != 0) {^M
-        for (Map.Entry<Text, Cell> e: values.entrySet()) {^M
-          // HStoreKey k = (HStoreKey) e.getKey();^M
-          key.setRow(values.getRow());^M
-          key.setVersion(e.getValue().getTimestamp());^M
-          key.setColumn(EMPTY_COLUMN);^M
-          results.put(e.getKey(), e.getValue().getValue());^M
-        }^M
+        return values;^M
       }^M
-      return values == null ? false : values.size() != 0;^M
+      ^M
+      return null;^M
{code}

instead as just

{code}
return values;
{code}

Is this used?

{code}
+public class IdentityTableReduce extends TableReduce<Text, BatchUpdate> {
+  private static final Log LOG =
+    LogFactory.getLog(IdentityTableReduce.class.getName());
{code}

For long in reduce, how about <regionid> ':' <rowindex_zeropadded> or jenkins 
hash of start row plus an index?  It irks me that we're duplicating info -- row 
and BatchUpdate.  Same irritiation about TableOutputFormat.

I was going to argue that minimally, a hash of the row would be easier on the 
sort but then hash will probably not sort same as key which would probably be a 
problem in that any one reducer would be inserting all over the table 
row-namespace -- probably not what we want.

This is great in TableOutputFormat

{code}
-    public void write(Text key, MapWritable value) throws IOException {
-      long xid = m_table.startUpdate(key);              // start transaction
-
-      for (Map.Entry<Writable, Writable> e: value.entrySet()) {
-        m_table.put(xid, (Text)e.getKey(),
-            ((ImmutableBytesWritable)e.getValue()).get());
-      }
-      m_table.commit(xid);                              // end transaction
+    public void write(Text key, BatchUpdate value) throws IOException {
+      m_table.commit(value);
     }
{code}

You were going to change the name of HInternalScannerInterface



> Improve client scanner interface
> --------------------------------
>
>                 Key: HBASE-521
>                 URL: https://issues.apache.org/jira/browse/HBASE-521
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: 521-v2.patch, 521-v3.patch, 521-v4.patch, 521-v5.patch, 
> 521.patch
>
>
> The current client scanner interface is pretty ugly. You need to instantiate 
> an HStoreKey and SortedMap<Text, byte[]> externally and then pass them into 
> next. This is pretty bad, because for starters, the client has to choose the 
> implementation of the map when they create it, so it's extra brain cycles to 
> figure that out. HStoreKey doesn't show up anywhere else in the entire client 
> side API, but here it bubbles out of next as a way to get the row and 
> presumably the timestamp of the columns.
> I propose that we supplant HScannerInterface with Scanner, an easier-to-use 
> version for clients. Its next method would look something like:
> {code}
> public RowResult next() throws IOException;
> {code}
> This packs the data up much more cleanly, including using Cells as values 
> instead of raw byte[], meaning you have much more granular timestamp 
> information. You also don't need HStoreKey anymore.
> By breaking Scanner away from HScannerInterface, we can leave the internal 
> scanning code completely alone (keep using HStoreKeys and such) but make the 
> client cleaner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to