Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1259: new scanner API with an encapsulated Batch object ......................................................................
KUDU-1259: new scanner API with an encapsulated Batch object This adds a new API for scanner results which encapsulates the result batch, allowing the caller to access the rows one row at a time, rather than constructing a vector<KuduRowResult>. This is important in the case that the result rows are small or empty (for example an empty projection, or scanning a single int8 column). In those cases, a single batch may return millions or even tens of millions of rows, in which case the vector<KuduRowResult> was taking up tens or hundreds of MBs of memory. The KuduRowResult class itself is renamed to KuduScanBatch::RowPtr, since that makes it more obvious that the row's lifetime is tied to the batch that it came from. The old name is preserved via a typedef that will provide for API compatibility for most users, though it does break the ABI since the implementation symbols are renamed. Given our beta status, it doesn't seem necessary to bump the soversion due to this ABI change. This refactoring ends up transferring the RpcController into the returned RowBatch object, so it will actually be feasible to use this to avoid copying strings in Impala -- we can simply attach the KuduScanBatch to the Impala RowBatch to tie the lifecycle of indirect data to the lifecycle of the rows in Impala. I made the appropriate small change in Impala to use the new API and verified that a SELECT COUNT(*) query which used to take 40+GB of RAM per server now only uses a few MB. Performance also improved about 10% for this query, likely due to less allocator pressure and page faults. The new KuduScanBatch class fits the C++ "iterable sequence" concept, and thus works with the C++11 range-for loop. Unfortunately it doesn't seem to work directly with BOOST_FOREACH. Change-Id: I29fd4fbb8b906ffa591853ab625ac4b089da4bc9 Reviewed-on: http://gerrit.cloudera.org:8080/1562 Tested-by: Internal Jenkins Reviewed-by: David Ribeiro Alves <[email protected]> --- M src/kudu/client/CMakeLists.txt M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h D src/kudu/client/row_result.cc M src/kudu/client/row_result.h A src/kudu/client/scan_batch.cc A src/kudu/client/scan_batch.h M src/kudu/client/scanner-internal.cc M src/kudu/client/scanner-internal.h M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h M src/kudu/tools/ts-cli.cc 13 files changed, 762 insertions(+), 528 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/1562 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I29fd4fbb8b906ffa591853ab625ac4b089da4bc9 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Binglin Chang <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Martin Grund <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]>
