[ 
https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676283#comment-13676283
 ] 

stack commented on HBASE-8691:
------------------------------

bq. ...but I don't know how to do that yet, and it didn't seem critical to 
validating my performance hypothesis.

Makes sense.  Nice hack starting Server and getting it going as a Servlet.

bq. ...but I did find that it's critical to do as little encoding of the stream 
as possible.

This is interesting.  Compressing or doing prefix encoding, we will want to put 
KVs together in blocks of 32k or so.  Your findings that....

bq. ...but I was surprised at just how few there actually are. I would have 
thought there was time to muck around with protobuf, but no.

... would seem to indicate that composing the blocks of prefix-encoded or 
compressed kvs would put us back to the send-pause-send-pause step function.

But what do you mean when you say this:

bq. I tested with scan caching 5000 and scan batch 5000

Were you batching up 5k kvs before writing them out on the wire?

As is, our rpc is not amenable at all to streaming.  There is one call and then 
it has a single result (or error).  Both call and result have their total size 
as effectively the first thing we transmit.  Introducing a protocol where size 
is not known and the results come in until an End-of-Stream marker is sent will 
be interesting to interweave into what we currently have; maybe it would be 
better to do as you do and just do new protocol over another port.  Let me take 
a looksee.

Good on you Sandy.


                
> High-Throughput Streaming Scan API
> ----------------------------------
>
>                 Key: HBASE-8691
>                 URL: https://issues.apache.org/jira/browse/HBASE-8691
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.95.0
>            Reporter: Sandy Pratt
>              Labels: perfomance, scan
>         Attachments: HRegionServlet.java, README.txt, RecordReceiver.java, 
> ScannerTest.java, StreamHRegionServer.java, StreamReceiverDirect.java, 
> StreamServletDirect.java
>
>
> I've done some working testing various ways to refactor and optimize Scans in 
> HBase, and have found that performance can be dramatically increased by the 
> addition of a streaming scan API.  The attached code constitutes a proof of 
> concept that shows performance increases of almost 4x in some workloads.
> I'd appreciate testing, replication, and comments.  If the approach seems 
> viable, I think such an API should be built into some future version of HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to