Todd Lipcon has posted comments on this change.

Change subject: KUDU-1235. Add Get API
......................................................................


Patch Set 1:

(10 comments)

didn't go through the java client yet or really look carefully at all the code. 
I think there are some potential bugs, though, that are worth solving before 
looking carefully.

I'm also interested to know where the main performance difference is, here. Do 
you have a comparison flame-graph of the tserver CPU usage handling a YCSB 
workload of get vs scan?

http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

Line 888:   ASSERT_NO_FATAL_FAILURE(InsertTestRows(
can you add a couple more tests here? I think there might be bugs with the case 
where a row actually has existed in multiple different RowSets, and maybe bugs 
where the row has been deleted. For example:

- insert test rows
- delete the rows
- verify that Get() of the row returns expected NotFound status
- force the tablet to Flush()
- verify that it still returns NotFound
- insert the rows again with new values
- verify that the correct value is returned


Can you also modify fuzz-itest.cc so that the GetRow() function there will 
randomly pick either using the new 'Get' API or a scan? I think that test will 
actually give you good coverage of the above scenario as well, but in a more 
thorough/random fashion.


http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 1073:   Status SetProjectedColumns(const std::vector<std::string>& 
col_names) WARN_UNUSED_RESULT;
rename to SetProjectedColumnNames (to match the non-deprecated API in 
KuduScanner)


http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/getter-internal.cc
File src/kudu/client/getter-internal.cc:

Line 47:     arena_(1024, 1024*1024),
this arena doesn't seem to be used


Line 61: Status KuduGetter::Data::SetProjectedColumns(const vector<string>& 
col_names) {
this function is copy-pasted from KuduScanner -- can we share the code somehow 
in a utility function that both of them call?


Line 93:   RETURN_NOT_OK(SchemaToColumnPBs(GetCurrentProjection(), 
request_.mutable_projected_columns()));
performance-wise, you could do this lazily only when the projection changes -- 
it's fairly expensive to construct the PB for a wide table. mind adding a TODO?


Line 111:         KuduClient::LEADER_ONLY,
why restrict to leader only? I think this API should mirror the Scanner API 
where the caller can pick which consistency guarantees are required


http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

Line 304:   if (row_key_util::IncrementKey(&new_row, &arena)) {
this is likely to conflict a lot with the stuff Dan has been working on. Dan, 
do you have any suggestions on how this could be easier on top of your 
ColumnPredicate work?


Line 315:       return iter->NextBlock(dst);
I'm not sure this is correct -- it's allowed for an iterator to return 
HasNext() but then not return any selected rows (eg due to deletions). see my 
comment on test coverage.


Line 1551:     vector<shared_ptr<RowwiseIterator> > *iters) const {
I don't think we need shared_ptr here -- this could be a vector<unique_ptr<...>>


Line 1567:     RETURN_NOT_OK_PREPEND(rs->NewRowIterator(projection, snap, 
&row_it),
seems like we are leaving a lot of potential optimization on the table by not 
checking bloom filters here. Is that something you plan to do as a follow-up?


-- 
To view, visit http://gerrit.cloudera.org:8080/2519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id054a198373da04a3eca5b244516acc378e1e37c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Binglin Chang <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to