[
https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679710#comment-13679710
]
Sandy Pratt commented on HBASE-8691:
------------------------------------
> I was wondering how you got a performance loss, context switching?
I think it was probably memory and cache synchronization, but that's just a
guess. It could have been due to wrapping results in protobuf (but note there
was a separate thread calling next concurrently). This was in the context of
the streaming servlet BTW, not a normal RPC.
Rather than try to explain in English, the code looked like this:
final long scannerId = hRegionServer.openScanner(regionName,
scan);
final ArrayBlockingQueue<Result[]> cache = new
ArrayBlockingQueue<Result[]>(5);
final Thread producer = new Thread() {
@Override
public void run() {
try {
while (true) {
Result[] results =
hRegionServer.next(scannerId, BATCH_SIZE);
cache.put(results);
if (results == null ||
results.length == 0) {
break;
}
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
};
producer.start();
long numRecords = 0;
try {
while (true) {
Result[] res = cache.take();
if (res == null) {
EventResult.Builder eos =
EventResult.newBuilder();
eos.setEndOfScan(true);
eos.setNumRecords(numRecords);
eos.build().writeDelimitedTo(resp.getOutputStream());
break;
} else if (res.length == 0) {
EventResult.Builder eor =
EventResult.newBuilder();
eor.setEndOfRegion(true);
eor.setNumRecords(numRecords);
eor.build().writeDelimitedTo(resp.getOutputStream());
break;
} else {
for (Result r : res) {
byte[] b = r.getValue(..., ...);
MyPB.Builder builder =
MyPB.newBuilder();
builder.mergeFrom(b);
MyPB pb = builder.build();
EventResult.Builder er =
EventResult.newBuilder();
er.setPbEvent(pb);
er.build().writeDelimitedTo(resp.getOutputStream());
numRecords++;
}
}
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
} finally {
resp.getOutputStream().close();
}
> High-Throughput Streaming Scan API
> ----------------------------------
>
> Key: HBASE-8691
> URL: https://issues.apache.org/jira/browse/HBASE-8691
> Project: HBase
> Issue Type: Improvement
> Components: Scanners
> Affects Versions: 0.95.0
> Reporter: Sandy Pratt
> Labels: perfomance, scan
> Attachments: HRegionServlet.java, README.txt, RecordReceiver.java,
> ScannerTest.java, StreamHRegionServer.java, StreamReceiverDirect.java,
> StreamServletDirect.java
>
>
> I've done some working testing various ways to refactor and optimize Scans in
> HBase, and have found that performance can be dramatically increased by the
> addition of a streaming scan API. The attached code constitutes a proof of
> concept that shows performance increases of almost 4x in some workloads.
> I'd appreciate testing, replication, and comments. If the approach seems
> viable, I think such an API should be built into some future version of HBase.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira