[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482738#comment-14482738
 ] 

Eshcar Hillel commented on HBASE-13071:
---------------------------------------

Thanks [~stack] for running this rig tests.
I believe the right way to see the benefit of this feature is to measure the 
scan.next() latency at the client side, there you should see the latency going 
down as you increase the delays.
Obviously, an async scanner puts more pressure on the server since the rate it 
is asking for records is higher. Since you are already stress testing the 
server with 50 (heavy scanners) clients, it could be that the extra pressure 
the async clients put on the server push it beyond its peak point.
Other than that, what is the prefetch size you are using? I assume it is less 
than 100. The scenarios in which async scanner would have maximum gain is when 
the client side processing (i.e., delays) are equal to the server side I/O time 
+ network delays. If the prefetch size is too small the network delays are more 
pronounced, and therefore the delays should be longer.

Finally, [~stack]  could you please share the client code you use for your 
tests, either via this Jira or send it directly to me, so I can take a closer 
look, and try it out myself.

> Hbase Streaming Scan Feature
> ----------------------------
>
>                 Key: HBASE-13071
>                 URL: https://issues.apache.org/jira/browse/HBASE-13071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
> HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
> HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
> HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
> HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
> HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
> HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
> HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
> gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
> latency.delay.png, latency.png, network.png
>
>
> A scan operation iterates over all rows of a table or a subrange of the 
> table. The synchronous nature in which the data is served at the client side 
> hinders the speed the application traverses the data: it increases the 
> overall processing time, and may cause a great variance in the times the 
> application waits for the next piece of data.
> The scanner next() method at the client side invokes an RPC to the 
> regionserver and then stores the results in a cache. The application can 
> specify how many rows will be transmitted per RPC; by default this is set to 
> 100 rows. 
> The cache can be considered as a producer-consumer queue, where the hbase 
> client pushes the data to the queue and the application consumes it. 
> Currently this queue is synchronous, i.e., blocking. More specifically, when 
> the application consumed all the data from the cache --- so the cache is 
> empty --- the hbase client retrieves additional data from the server and 
> re-fills the cache with new data. During this time the application is blocked.
> Under the assumption that the application processing time can be balanced by 
> the time it takes to retrieve the data, an asynchronous approach can reduce 
> the time the application is waiting for data.
> We attach a design document.
> We also have a patch that is based on a private branch, and some evaluation 
> results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to