[ 
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703916#action_12703916
 ] 

Erik Holstad commented on HBASE-1249:
-------------------------------------

@Stack
+ I totally agree that the naming of the compareTo(KeyValue) method is not the 
best, should probably be something else like you suggested hasEnough or 
shouldAdd or something along those lines.

+ Why this new code is faster than the old one. There are a couple of things 
that will make this new code faster than the old one:
You have your KeyValue iterator, data length k, your columns asked for, gets 
length l,  and your deleteStructure, deletes length m.
1. The way a get is done today is that for every data you compare it to all the 
columns asked for and for every match you do a contains on that data. This 
means that you will do k*l +l*something for the delete contains check. Since 
the deletes are stored in a sortedSet every insert into it takes log(n)
2. The way the KeyValue object is used in many places it to call getRowLength 
and then getColumnLength after each other, this means that you have to do all 
the calculations for the lengths and offsets multiple times. 
3. When not having a structure from the client that groups families together 
you will have to get the right store for every KeyValue. 

1. The new code don't have any complicated data structures, just 
ArrayList<KeyValue> and the only operation that is done to these are 
add(KeyValue), so the time complexity if compared to the old way is k+l+m which 
is much fewer compares. 
2. In the new code I have moved all the compare methods in to the actual code, 
this means that there will be more code that is duplicated, but I think that it 
is a better approach when we are going for speed. I don't recalculate and 
lengths or offset if it is not absolutely necessary .

But the biggest gain in time comes from not having complicated data structures 
on the server but rather keeping it simple. Of course there are other things 
that becomes more complicated like merging the lists after every storefile, but 
there is no way around that as I see it. Don't think it gets much faster than 
doing a sorted merge.


> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1249-Example-v1.pdf, HBASE-1249-Example-v2.pdf, 
> HBASE-1249-GetQuery-v1.pdf, HBASE-1249-GetQuery-v2.pdf, 
> HBASE-1249-GetQuery-v3.pdf, HBASE-1249-GetQuery-v4.pdf, 
> HBASE-1249-StoreFile-v1.pdf, HBASE-1249-StoreFile-v4.pdf
>
>
> To discuss all the new and potential issues coming out of the change in key 
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API 
> (HBASE-880), server optimizations, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to