Binglin Chang has posted comments on this change. Change subject: KUDU-1235. Add Get API ......................................................................
Patch Set 1: (8 comments) > Do you have a comparison flame-graph of the tserver CPU usage handling a YCSB > workload of get vs scan Will post them on jira http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 888: ASSERT_NO_FATAL_FAILURE(InsertTestRows( > can you add a couple more tests here? I think there might be bugs with the ok http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client.h File src/kudu/client/client.h: Line 1073: Status SetProjectedColumns(const std::vector<std::string>& col_names) WARN_UNUSED_RESULT; > rename to SetProjectedColumnNames (to match the non-deprecated API in KuduS ok http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/getter-internal.cc File src/kudu/client/getter-internal.cc: Line 93: RETURN_NOT_OK(SchemaToColumnPBs(GetCurrentProjection(), request_.mutable_projected_columns())); > performance-wise, you could do this lazily only when the projection changes OK Line 111: KuduClient::LEADER_ONLY, > why restrict to leader only? I think this API should mirror the Scanner API Sure, will add later. http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 304: if (row_key_util::IncrementKey(&new_row, &arena)) { > this is likely to conflict a lot with the stuff Dan has been working on. Da Another way is to stop use scanspec, and implement Get in every rowset class, which is a major effort, but should have much better performance I think. Line 315: return iter->NextBlock(dst); > I'm not sure this is correct -- it's allowed for an iterator to return HasN OK, will check and add tests. Line 1551: vector<shared_ptr<RowwiseIterator> > *iters) const { > I don't think we need shared_ptr here -- this could be a vector<unique_ptr< yes Line 1567: RETURN_NOT_OK_PREPEND(rs->NewRowIterator(projection, snap, &row_it), > seems like we are leaving a lot of potential optimization on the table by n After compaction, most request only ends up in very few RS (mostly 1), so I skeptical here. One optimization I am think of is to lookup diskrowset first, then memory rowset, cause it's most likely then the record is in diskrowset. Also yes construct PB is time consuming too. -- To view, visit http://gerrit.cloudera.org:8080/2519 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id054a198373da04a3eca5b244516acc378e1e37c Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Binglin Chang <[email protected]> Gerrit-Reviewer: Binglin Chang <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
