[jira] Commented: (HBASE-521) Improve client scanner interface

Bryan Duxbury (JIRA) Mon, 24 Mar 2008 15:19:34 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581697#action_12581697
 ]


Bryan Duxbury commented on HBASE-521:
-------------------------------------

>however: remove commented out code in ScannerHandler, TestScannerAPI
Done.

>I suppose RowResults could easily be so big, they'd blow out memory on client 
>or server. Whats our defense? That designing your request/MR, that you are not 
>select too much? (I suppose we've always had this prob. This patch does not 
>introduce it)
We have this problem throughout our project. Our RPC framework is, well, an 
RPC, not a stream, so we really can't handle alternatives. I think we should 
deal with oversized requests and replies when people actually show us cases 
where it both makes sense and is a problem.

>Should we bite the bullet and change the name of the methods in HTable to be 
>getScanner instead of 'obtainScanner - just deprecate the old ones... In fact, 
>we probably should do this since we're breaking the methods anyways (add 
>deprecate to old obtainScanner methods).
Makes sense, probably should do this. It is an issue that we're breaking 
compatibility - we could offer a DeprecatedScanner wrapper class that coverts 
the RowResult back, if we wanted to. TableMap and friends, I don't see the 
point in trying to keep them reverse compatible, because they didn't work quite 
right in the first place. The changes I have made give you a lot more options 
(BatchUpdates as values for TIF), things we actually wanted to fix in 0.2 
anyway.

I'll change HInternalScannerInterface to InternalScanner. Much cleaner to read.

Reverted changes to Migrate.java. Changes were relevant until another patch got 
applied.

> For IdentityTableReduce, the interface should be <Long, BatchUpdate>, rather 
> than <Text, BatchUpdate>?
Where would we generate the Long from? Why can't the Text rowkey be used as the 
identity attribute? Not that it really matters - all the BatchUpdates are just 
going to be applied individually. There's no merging or anything in 
IdentityTableReduce.

> Improve client scanner interface
> --------------------------------
>
>                 Key: HBASE-521
>                 URL: https://issues.apache.org/jira/browse/HBASE-521
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: 521.patch
>
>
> The current client scanner interface is pretty ugly. You need to instantiate 
> an HStoreKey and SortedMap<Text, byte[]> externally and then pass them into 
> next. This is pretty bad, because for starters, the client has to choose the 
> implementation of the map when they create it, so it's extra brain cycles to 
> figure that out. HStoreKey doesn't show up anywhere else in the entire client 
> side API, but here it bubbles out of next as a way to get the row and 
> presumably the timestamp of the columns.
> I propose that we supplant HScannerInterface with Scanner, an easier-to-use 
> version for clients. Its next method would look something like:
> {code}
> public RowResult next() throws IOException;
> {code}
> This packs the data up much more cleanly, including using Cells as values 
> instead of raw byte[], meaning you have much more granular timestamp 
> information. You also don't need HStoreKey anymore.
> By breaking Scanner away from HScannerInterface, we can leave the internal 
> scanning code completely alone (keep using HStoreKeys and such) but make the 
> client cleaner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-521) Improve client scanner interface

Reply via email to