Adar Dembo has posted comments on this change. Change subject: KUDU-1363: Add in-list predicates for extracting a set of equalities. ......................................................................
Patch Set 4: (29 comments) I reviewed the client pieces and only started reviewing the common stuff. Will finish that in the next iteration. > The TSAN failure seems to be something unrelated (perhaps some config issue > in Jenkins?). >From the console output: 17:11:33 isolate: push(/home/jenkins-slave/workspace/kudu-1/build/tsan/bin/cache-test) failed: http request failed: Internal Server Error (HTTP 500) 17:11:33 17:11:34 Failed to run isolate 17:11:34 Traceback (most recent call last): 17:11:34 File "/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 430, in <module> 17:11:34 main(sys.argv[1:]) 17:11:34 File "/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 426, in main 17:11:34 args.func(p, args) 17:11:34 File "/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 381, in run_all_tests 17:11:34 run_isolate(staging) 17:11:34 File "/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 335, in run_isolate 17:11:34 '--'] + staging.gen_json_paths()) 17:11:34 File "/usr/lib/python2.7/subprocess.py", line 540, in check_call 17:11:34 raise CalledProcessError(retcode, cmd) Looks like some sort of issue submitting the tests to the distributed test server. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 602: KuduPredicate* KuduTable::NewInListPredicate(const Slice &col_name, Nit: const Slice& col_name. Check your new code elsewhere to make sure any ampersand (ref) or star (pointer) is adjacent to the type itself, not the variable name. Line 605: StringPiece name_sp(reinterpret_cast<const char*>(col_name.data()), col_name.size()); This column name check is the same in NewComparisonPredicate; can you share the code? Line 619: // InListPredicate is not available for BOOL datatype. Why not? Line 624: Status::NotSupported("Inlist predicate not supported for datatype `bool`"))); Nit: Status messages should start with a lower-case letter. Line 626: return new KuduPredicate(new InListPredicateData(s->column(col_idx), op, values)); Should we also check that every type in 'values' is equal to s->column(col_idx).type_info()->physical_type()? http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/client.h File src/kudu/client/client.h: Line 443: // Create a new inlist predicate which can be used for scanners Nit: is "inlist" useful terminology? Is it a term people from relational database backgrounds will be familiar with? If not, I suggest you reword this sentence to explain in plainer English how this predicate works. Line 446: // The type of 'values' list must correspond to the type of the column to which Nit: "The type of 'values' list" is weird. Can you reword? Maybe "the types of the entries in 'values'..." Line 452: // The caller owns the result until it is passed into KuduScanner::AddConjunctPredicate(). : // The returned predicate takes ownership of 'value'. Update this. Line 459: KuduPredicate::ComparisonOp op, Shouldn't need an op, the "comparison" is implicit (IN). Line 460: std::vector<KuduValue*>* values); Taking ownership of the vector by pointer itself is a little awkward because it forces callers to allocate it on the heap. Instead, what if we did this: 1. Continue to pass the vector by pointer, but 2. Internally, don't store the pointer; move its contents to a different vector owned by the predicate. You might implement it like this: class Bar { ... } class Foo { public: Foo(vector<Bar*>* belongs_to_caller) { belongs_to_foo_.swap(*belongs_to_caller); } private: vector<Bar*> belongs_to_foo_; } In this example, after constructing a Foo, the caller can still access its vector (Foo didn't take ownership of it), but the vector is empty. This is both safe and performant since the swap probably just swaps internal vector pointers. I think this will also allow you to simplify the way you store the values internally. The new predicate data class could have its own vector (not a vector pointer), and use STLDeleteElements() in the destructor. Then you won't need to use an AutoReleasePool at all. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/predicate-test.cc File src/kudu/client/predicate-test.cc: Line 288: { // column IN (<int-list>) Nit: "value IN (<int-list>)" Below too. Line 289: int count = 0; Nit: declare count as close to where it's actually used as possible. Below too. Line 290: RemoveDuplicates<T>(&test_values); You should make a copy of test_values first, so that if someone adds a test after this they'll still get duplicates. But, on second thought, creating the predicate deduplicates the values, so why deduplicate them first here? If it's to get an accurate expected row count, could you constrain the deduplication to just that part of the test code? Below too. Line 292: for (T v: test_values) { Nit: for (T v : test_values) { Line 305: STLDeleteElements(vals); : delete vals; Why do you need this? Doesn't the KuduPredicate take ownership of vals and its contents? Then the scanner in CountRows take's ownership of that, and deletes everything when it goes out of scope? Below too. Line 388: for (string& v: test_values) { You're not modifying v, so you should iterate by const-ref. On L391 too. Note that when operating on integers, floats, and other simple data types, iterating by value is preferable, because it's probably cheaper to copy the value than it is to follow the reference. Line 393: [&] (string& t) { return t == v; }); t should be const-ref. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate-internal.h File src/kudu/client/scan_predicate-internal.h: Line 106: std::vector<KuduValue*>* val_cloned = new std::vector<KuduValue*>(); : for (auto val: *vals_) { : val_cloned->push_back(val->Clone()); : } This can be combined: std::vector<KuduValue>* val_cloned = new std::vector<KuduValue*>(vals_->begin(), vals_->end()); Line 117: void Dedup(); Nit: add an empty line after this. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate.cc File src/kudu/client/scan_predicate.cc: Line 105: std::vector<KuduValue*>* values ) Nit: extra space between values and ) Also you can drop std:: from vector here. Line 134: // KuduPredicate::INLIST What's this comment for? Line 140: void InListPredicateData::Dedup() { With the suggestion I gave you for how to handle the incoming values array, it would be simpler to dedupe in the constructor like this: InListPredicateData::InListPredicateData(ColumnSchema col, KuduPredicate::ComparisonOp op, vector<KuduValue*>* values) : col_(move(col)), op_(op) { set<KuduValue*> deduped(values->begin(), values->end()); vals_.assign(deduped.begin(), deduped.end()); values->clear(); } http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate.h File src/kudu/client/scan_predicate.h: Line 39: INLIST You shouldn't need this; InList implies both a kind of value (a list) and a "comparison" (IN). http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/value.cc File src/kudu/client/value.cc: Line 208: return false; I think a LOG(FATAL) here would be more appropriate; if someone adds a new value type, they should add new entries to operator< and operator==. Below too. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/value.h File src/kudu/client/value.h: Line 53: // Returns true if KuduValue A < KuduValue B : bool operator<(KuduValue& other) const; : : // Returns true if KuduValue A == KuduValue B : bool operator==(KuduValue& other) const; The google style guide (which Kudu uses) generally discourages operator overloading: https://google.github.io/styleguide/cppguide.html#Operator_Overloading In this case I think it's OK, but you should follow its advice regarding binary operators: Prefer to define non-modifying binary operators as non-member functions. If a binary operator is defined as a class member, implicit conversions will apply to the right-hand argument, but not the left-hand one. It will confuse your users if a < b compiles but b < a doesn't. You can follow Slice as an example. http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 169: return values_; Nit: got a stray tab char here. Line 177: explicit ColumnPredicate(PredicateType predicate_type, Nit: explicit is only needed for single-arg constructors. Line 182: // Creates a new column predicate. Nit: modify the two comments to make it clear what the difference in constructors are. Line 226: std::vector<void*>* values_; Why does this need to be a pointer? Can't the ColumnPredicate own its own vector? -- To view, visit http://gerrit.cloudera.org:8080/2986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I986cb13097f1cf4af2b752f58c4c38f412a6a598 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sameer Abhyankar <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sameer Abhyankar <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
