[
https://issues.apache.org/jira/browse/HBASE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917125#action_12917125
]
stack commented on HBASE-3073:
------------------------------
On this patch, if it fixed HBASE-1937 and HBASE-2753 it'd help your case that
the fix for this issue is 'vital'.
Whats this?
{code}
+ //private transient int size = 0; // size of underlying data (kv.getLength
all added together)
{code}
Remove it?
For this change....
{code}
/**
- * Return the unsorted array of KeyValues backing this Result instance.
- * @return unsorted array of KeyValues
+ * Return the array of KeyValues backing this Result instance.
+ * @return array of KeyValues
*/
{code]
... is the result actually sorted? If so, javadoc should say so?
Why we sorting (see HBASE-2753)?
{code}
public KeyValue[] sorted() {
- if (isEmpty()) { // used for side effect!
- return null;
- }
+ raw(); // side effect of loading this.kvs
if (!sorted) {
{code}
...
Below .....
{code}
+ * @param family
+ * @param qualifier
+ * @return
+ */
{code}
fix the javadoc... fix the return at least. It looks like if no such column
you don't get null but an empty List. Should note that.
Need to fix javadoc in these highly visible methods elsewhere in the patch too.
Do these new APIs replace others? Are there equivs in the API already for
these? If so, deprecate the old and link in javadoc else new APIs to do same
thing confuses.
Fix these names:
valueEx
and
containsColumnEx
> New APIs for Result, faster implementation for some calls
> ---------------------------------------------------------
>
> Key: HBASE-3073
> URL: https://issues.apache.org/jira/browse/HBASE-3073
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.89.20100924
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.90.0
>
> Attachments: HBASE-3073.txt
>
>
> Our existing API for Result hasn't been given much love in the last year. In
> the mean time, inefficiencies in the existing implementation have come to
> light, causing issues with benchmarks. Furthermore, some people are finding
> the API both difficult to use as well as not useful enough (See: HBASE-1937).
> I propose the following new APIs:
> public List<KeyValue> getColumn(byte [] family, byte [] qualifier);
> public KeyValue getColumnLatest(byte [] family, byte [] qualifier);
> The implementation of these use a binary search on the underlying kvs array
> (which is sorted). I also have new implementations for
> public boolean containsColumn(byte [] family, byte [] qualifier);
> public byte [] getValue(byte [] family, byte [] qualifier);
> Which in the small case run faster, but in the big case seem to run a bit
> slower. That is if you call getValue() 10 times for a Result it will be
> faster with the new implementation, but if you call getValue() 100 times for
> the same Result it is faster using the old implementation. My tests
> indicated about 10% slower on 'getValue' 100x with an overall 1000x iteration
> on 1000 different Result objects. Considering most people use getValue() to
> retrieve named columns and iteration when the qualifier list is unknown I
> think this is a reasonable trade off.
> Along with the new API, there is a recommendation to use raw() to get the
> list of KeyValue objects for iteration. This increases the visibility of
> KeyValue, and also is much faster to iterate (4.9 times on my mini benchmark,
> 100 columns per Result, redone 1000 times on different Result objects).
> Given my recent major speed boost by changing YCSB to use the raw()
> interface, I think that this is a must have for 0.90.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.