[
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703916#action_12703916
]
Erik Holstad commented on HBASE-1249:
-------------------------------------
@Stack
+ I totally agree that the naming of the compareTo(KeyValue) method is not the
best, should probably be something else like you suggested hasEnough or
shouldAdd or something along those lines.
+ Why this new code is faster than the old one. There are a couple of things
that will make this new code faster than the old one:
You have your KeyValue iterator, data length k, your columns asked for, gets
length l, and your deleteStructure, deletes length m.
1. The way a get is done today is that for every data you compare it to all the
columns asked for and for every match you do a contains on that data. This
means that you will do k*l +l*something for the delete contains check. Since
the deletes are stored in a sortedSet every insert into it takes log(n)
2. The way the KeyValue object is used in many places it to call getRowLength
and then getColumnLength after each other, this means that you have to do all
the calculations for the lengths and offsets multiple times.
3. When not having a structure from the client that groups families together
you will have to get the right store for every KeyValue.
1. The new code don't have any complicated data structures, just
ArrayList<KeyValue> and the only operation that is done to these are
add(KeyValue), so the time complexity if compared to the old way is k+l+m which
is much fewer compares.
2. In the new code I have moved all the compare methods in to the actual code,
this means that there will be more code that is duplicated, but I think that it
is a better approach when we are going for speed. I don't recalculate and
lengths or offset if it is not absolutely necessary .
But the biggest gain in time comes from not having complicated data structures
on the server but rather keeping it simple. Of course there are other things
that becomes more complicated like merging the lists after every storefile, but
there is no way around that as I see it. Don't think it gets much faster than
doing a sorted merge.
> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
> Key: HBASE-1249
> URL: https://issues.apache.org/jira/browse/HBASE-1249
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: HBASE-1249-Example-v1.pdf, HBASE-1249-Example-v2.pdf,
> HBASE-1249-GetQuery-v1.pdf, HBASE-1249-GetQuery-v2.pdf,
> HBASE-1249-GetQuery-v3.pdf, HBASE-1249-GetQuery-v4.pdf,
> HBASE-1249-StoreFile-v1.pdf, HBASE-1249-StoreFile-v4.pdf
>
>
> To discuss all the new and potential issues coming out of the change in key
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API
> (HBASE-880), server optimizations, etc...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.