[
https://issues.apache.org/jira/browse/HBASE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584697#action_12584697
]
stack commented on HBASE-521:
-----------------------------
Comments on v5
Your entry in CHANGES.txt belongs in the INCOMPATIBLE section
You were going to deprecate obtainScanner and make a getScanner instead in
HTable?
Can you leave one of these in place?
{code}
- LOG.debug("Advancing internal scanner to startKey " + localStartKey);^M
this.currentRegionLocation = getRegionLocation(localStartKey);^M
- ^M
- LOG.debug("New region: " + this.currentRegionLocation);^
{code}
I know they look useful but I've found them useful debugging trying to figure
problematic region.... WIth them in place I know the last or current
problematic region when scanning.
Can you write the below...
{code}
if (values != null && values.size() != 0) {^M
- for (Map.Entry<Text, Cell> e: values.entrySet()) {^M
- // HStoreKey k = (HStoreKey) e.getKey();^M
- key.setRow(values.getRow());^M
- key.setVersion(e.getValue().getTimestamp());^M
- key.setColumn(EMPTY_COLUMN);^M
- results.put(e.getKey(), e.getValue().getValue());^M
- }^M
+ return values;^M
}^M
- return values == null ? false : values.size() != 0;^M
+ ^M
+ return null;^M
{code}
instead as just
{code}
return values;
{code}
Is this used?
{code}
+public class IdentityTableReduce extends TableReduce<Text, BatchUpdate> {
+ private static final Log LOG =
+ LogFactory.getLog(IdentityTableReduce.class.getName());
{code}
For long in reduce, how about <regionid> ':' <rowindex_zeropadded> or jenkins
hash of start row plus an index? It irks me that we're duplicating info -- row
and BatchUpdate. Same irritiation about TableOutputFormat.
I was going to argue that minimally, a hash of the row would be easier on the
sort but then hash will probably not sort same as key which would probably be a
problem in that any one reducer would be inserting all over the table
row-namespace -- probably not what we want.
This is great in TableOutputFormat
{code}
- public void write(Text key, MapWritable value) throws IOException {
- long xid = m_table.startUpdate(key); // start transaction
-
- for (Map.Entry<Writable, Writable> e: value.entrySet()) {
- m_table.put(xid, (Text)e.getKey(),
- ((ImmutableBytesWritable)e.getValue()).get());
- }
- m_table.commit(xid); // end transaction
+ public void write(Text key, BatchUpdate value) throws IOException {
+ m_table.commit(value);
}
{code}
You were going to change the name of HInternalScannerInterface
> Improve client scanner interface
> --------------------------------
>
> Key: HBASE-521
> URL: https://issues.apache.org/jira/browse/HBASE-521
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: client
> Reporter: Bryan Duxbury
> Assignee: Bryan Duxbury
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: 521-v2.patch, 521-v3.patch, 521-v4.patch, 521-v5.patch,
> 521.patch
>
>
> The current client scanner interface is pretty ugly. You need to instantiate
> an HStoreKey and SortedMap<Text, byte[]> externally and then pass them into
> next. This is pretty bad, because for starters, the client has to choose the
> implementation of the map when they create it, so it's extra brain cycles to
> figure that out. HStoreKey doesn't show up anywhere else in the entire client
> side API, but here it bubbles out of next as a way to get the row and
> presumably the timestamp of the columns.
> I propose that we supplant HScannerInterface with Scanner, an easier-to-use
> version for clients. Its next method would look something like:
> {code}
> public RowResult next() throws IOException;
> {code}
> This packs the data up much more cleanly, including using Cells as values
> instead of raw byte[], meaning you have much more granular timestamp
> information. You also don't need HStoreKey anymore.
> By breaking Scanner away from HScannerInterface, we can leave the internal
> scanning code completely alone (keep using HStoreKeys and such) but make the
> client cleaner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.