[ 
https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679710#comment-13679710
 ] 

Sandy Pratt commented on HBASE-8691:
------------------------------------

> I was wondering how you got a performance loss, context switching?

I think it was probably memory and cache synchronization, but that's just a 
guess.  It could have been due to wrapping results in protobuf (but note there 
was a separate thread calling next concurrently).  This was in the context of 
the streaming servlet BTW, not a normal RPC.

Rather than try to explain in English, the code looked like this:

                final long scannerId = hRegionServer.openScanner(regionName, 
scan);

                final ArrayBlockingQueue<Result[]> cache = new 
ArrayBlockingQueue<Result[]>(5);

                final Thread producer = new Thread() {
                        @Override
                        public void run() {
                                try {
                                        while (true) {
                                                Result[] results = 
hRegionServer.next(scannerId, BATCH_SIZE);
                                                cache.put(results);
                                                if (results == null || 
results.length == 0) {
                                                        break;
                                                }
                                        }
                                } catch (Exception e) {
                                        throw new RuntimeException(e);
                                }
                        }
                };

                producer.start();

                long numRecords = 0;

                try {
                        while (true) {
                                Result[] res = cache.take();
                                if (res == null) {
                                        EventResult.Builder eos = 
EventResult.newBuilder();
                                        eos.setEndOfScan(true);
                                        eos.setNumRecords(numRecords);
                                        
eos.build().writeDelimitedTo(resp.getOutputStream());
                                        break;
                                } else if (res.length == 0) {
                                        EventResult.Builder eor = 
EventResult.newBuilder();
                                        eor.setEndOfRegion(true);
                                        eor.setNumRecords(numRecords);
                                        
eor.build().writeDelimitedTo(resp.getOutputStream());
                                        break;
                                } else {
                                        for (Result r : res) {
                                                byte[] b = r.getValue(..., ...);
                                                MyPB.Builder builder = 
MyPB.newBuilder();
                                                builder.mergeFrom(b);
                                                MyPB pb = builder.build();
                                                EventResult.Builder er = 
EventResult.newBuilder();
                                                er.setPbEvent(pb);
                                                
er.build().writeDelimitedTo(resp.getOutputStream());
                                                numRecords++;
                                        }
                                }
                        }
                } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                } finally {
                        resp.getOutputStream().close();
                }
                
> High-Throughput Streaming Scan API
> ----------------------------------
>
>                 Key: HBASE-8691
>                 URL: https://issues.apache.org/jira/browse/HBASE-8691
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.95.0
>            Reporter: Sandy Pratt
>              Labels: perfomance, scan
>         Attachments: HRegionServlet.java, README.txt, RecordReceiver.java, 
> ScannerTest.java, StreamHRegionServer.java, StreamReceiverDirect.java, 
> StreamServletDirect.java
>
>
> I've done some working testing various ways to refactor and optimize Scans in 
> HBase, and have found that performance can be dramatically increased by the 
> addition of a streaming scan API.  The attached code constitutes a proof of 
> concept that shows performance increases of almost 4x in some workloads.
> I'd appreciate testing, replication, and comments.  If the approach seems 
> viable, I think such an API should be built into some future version of HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to