[
https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918303#action_12918303
]
HBase Review Board commented on HBASE-3073:
-------------------------------------------
Message from: "Ryan Rawson" <[email protected]>
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/963/#review1426
-----------------------------------------------------------
trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java
<http://review.cloudera.org/r/963/#comment4980>
i removed this in my svn client, it was remnant of my perf testing
- Ryan
> New APIs for Result, faster implementation for some calls
> ---------------------------------------------------------
>
> Key: HBASE-3073
> URL: https://issues.apache.org/jira/browse/HBASE-3073
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.89.20100924
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.90.0
>
> Attachments: HBASE-3073.txt
>
>
> Our existing API for Result hasn't been given much love in the last year. In
> the mean time, inefficiencies in the existing implementation have come to
> light, causing issues with benchmarks. Furthermore, some people are finding
> the API both difficult to use as well as not useful enough (See: HBASE-1937).
> I propose the following new APIs:
> public List<KeyValue> getColumn(byte [] family, byte [] qualifier);
> public KeyValue getColumnLatest(byte [] family, byte [] qualifier);
> The implementation of these use a binary search on the underlying kvs array
> (which is sorted). I also have new implementations for
> public boolean containsColumn(byte [] family, byte [] qualifier);
> public byte [] getValue(byte [] family, byte [] qualifier);
> Which in the small case run faster, but in the big case seem to run a bit
> slower. That is if you call getValue() 10 times for a Result it will be
> faster with the new implementation, but if you call getValue() 100 times for
> the same Result it is faster using the old implementation. My tests
> indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration
> on 1000 different Result objects. Considering most people use getValue() to
> retrieve named columns and iteration when the qualifier list is unknown I
> think this is a reasonable trade off.
> Along with the new API, there is a recommendation to use raw() to get the
> list of KeyValue objects for iteration. This increases the visibility of
> KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark,
> 100 columns per Result, redone 1000 times on different Result objects).
> Given my recent major speed boost by changing YCSB to use the raw()
> interface, I think that this is a must have for 0.90.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.