[
https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070067#comment-13070067
]
stack commented on HBASE-1938:
------------------------------
bq. I modified the unit test to make it work with the trunk as it is today (new
file attached).
Thanks.
Reviewing it, one thing you might want to do is study classes in hbase so get
gist of the hadoop/hbase style. Notice how they have two spaces for tabs, ~80
chars a line. But thats for future. Not important here.
You just need to make sure your KVs have a readPoint that is less than the
current readPoint. It looks like you are making KVs w/o setting memstorets.
Default then is used and its zero. The default read point is zero. The
compare is <= so it looks like you don't need to set the read point at all.
What you have should be no harm.
Your new test class seems fine. Would be nice to add more tests. As memstore
data structure grows, all slows.
Another issue is about hacking on the concurrentskiplistset that is memstore to
make it more suited to our accesses and perhaps to make it go faster (its
public domain when you dig down into the java src).
bq. On a scan the "next()" part, the hbase currently compare the value of two
internals iterators. In this test, the second list is always empty, hence the
cost on comparator is lowered vs. real life.
What is this that you are referring too? Is it this? KeyValue kv =
scanner.next();
bq. But I don't think it worth a patch just for this (it should be included in
a bigger patch hoewever).
Up to you but yes, the above is probably the way to go.
Thanks N.
> Make in-memory table scanning faster
> ------------------------------------
>
> Key: HBASE-1938
> URL: https://issues.apache.org/jira/browse/HBASE-1938
> Project: HBase
> Issue Type: Improvement
> Components: performance
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Attachments: MemStoreScanPerformance.java,
> MemStoreScanPerformance.java, caching-keylength-in-kv.patch, test.patch
>
>
> This issue is about profiling hbase to see if I can make hbase scans run
> faster when all is up in memory. Talking to some users, they are seeing
> about 1/4 million rows a second. It should be able to go faster than this
> (Scanning an array of objects, they can do about 4-5x this).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira