[ 
https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070819#comment-13070819
 ] 

stack commented on HBASE-1938:
------------------------------

bq. To me, it makes sense to precalculate the value returned by "peek", and 
reuse it in next().

If there is no chance of the value changing between the peek and next, it 
sounds good (I've not looked at this code in a while).

bq.  It would be great to save this system call in a next().

Yes (I like how you figure there's a system call doing thread local get).

bq. In fact, as this value seems to be a TLS, I don't see how it could change 
during the execution of next(). What do you think?

(I'm being lazy.  I've not looked at the code).  The updates to RWCC happen at 
well-defined points so should be easy enough to elicit if there is a problem w/ 
your presumption above.

bq. Last question on this: what is the use case when the getThreadReadPoint() 
will change during the whole scan (i.e.: between next)?

IIRC, we want to let the scan see the most up-to-date view on a row though our 
guarantees are less than this (See http://hbase.apache.org/acid-semantics.html).

bq. Most of the public methods (except reseek) are "synchronized", it implies 
that the scanner can be shared between threads?

That seems like a valid deduction to make.

bq. 1) Replacement of KeyValue lowest = getLowest();

You mean in MemStore#reseek?  What would you put in its place (Sorry if I'm not 
following the bouncing ball properly).

bq. ...don't get the data getThreadReadPoint()

So, we'd just hold to the current read point for how long?  The full scan?  
That might be possible given our lax guarantees above though it would be nice 
to not have to give up on up to the millisecond views on rows.

bq. Another option is to share getThreadReadPoint() value for the two 
iterators, i.e. read the value in the next() function, and give it as a 
parameter to getNext()

What are the 'two iterators' here?

Sorry N, I don't have my head as deep in this stuff as you do currently so my 
questions and answers above may be off.  Please compensate appropriately.

> Make in-memory table scanning faster
> ------------------------------------
>
>                 Key: HBASE-1938
>                 URL: https://issues.apache.org/jira/browse/HBASE-1938
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: MemStoreScanPerformance.java, 
> MemStoreScanPerformance.java, caching-keylength-in-kv.patch, test.patch
>
>
> This issue is about profiling hbase to see if I can make hbase scans run 
> faster when all is up in memory.  Talking to some users, they are seeing 
> about 1/4 million rows a second.  It should be able to go faster than this 
> (Scanning an array of objects, they can do about 4-5x this).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to