Adar Dembo has posted comments on this change.

Change subject: KUDU-1363: Add in-list predicates for extracting a set of 
equalities.
......................................................................


Patch Set 4:

(29 comments)

I reviewed the client pieces and only started reviewing the common stuff. Will 
finish that in the next iteration.

 > The TSAN failure seems to be something unrelated (perhaps some config issue 
 > in Jenkins?).

>From the console output:

17:11:33 isolate: 
push(/home/jenkins-slave/workspace/kudu-1/build/tsan/bin/cache-test) failed: 
http request failed: Internal Server Error (HTTP 500)
17:11:33 
17:11:34 Failed to run isolate
17:11:34 Traceback (most recent call last):
17:11:34   File 
"/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 430, in 
<module>
17:11:34     main(sys.argv[1:])
17:11:34   File 
"/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 426, in 
main
17:11:34     args.func(p, args)
17:11:34   File 
"/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 381, in 
run_all_tests
17:11:34     run_isolate(staging)
17:11:34   File 
"/home/jenkins-slave/workspace/kudu-1/build-support/dist_test.py", line 335, in 
run_isolate
17:11:34     '--'] + staging.gen_json_paths())
17:11:34   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
17:11:34     raise CalledProcessError(retcode, cmd)

Looks like some sort of issue submitting the tests to the distributed test 
server.

http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/client.cc
File src/kudu/client/client.cc:

Line 602: KuduPredicate* KuduTable::NewInListPredicate(const Slice &col_name,
Nit: const Slice& col_name. Check your new code elsewhere to make sure any 
ampersand (ref) or star (pointer) is adjacent to the type itself, not the 
variable name.


Line 605:   StringPiece name_sp(reinterpret_cast<const char*>(col_name.data()), 
col_name.size());
This column name check is the same in NewComparisonPredicate; can you share the 
code?


Line 619:     // InListPredicate is not available for BOOL datatype.
Why not?


Line 624:       Status::NotSupported("Inlist predicate not supported for 
datatype `bool`")));
Nit: Status messages should start with a lower-case letter.


Line 626:   return new KuduPredicate(new 
InListPredicateData(s->column(col_idx), op, values));
Should we also check that every type in 'values' is equal to 
s->column(col_idx).type_info()->physical_type()?


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 443:   // Create a new inlist predicate which can be used for scanners
Nit: is "inlist" useful terminology? Is it a term people from relational 
database backgrounds will be familiar with? If not, I suggest you reword this 
sentence to explain in plainer English how this predicate works.


Line 446:   // The type of 'values' list must correspond to the type of the 
column to which
Nit: "The type of 'values' list" is weird. Can you reword? Maybe "the types of 
the entries in 'values'..."


Line 452:   // The caller owns the result until it is passed into 
KuduScanner::AddConjunctPredicate().
        :   // The returned predicate takes ownership of 'value'.
Update this.


Line 459:                                     KuduPredicate::ComparisonOp op,
Shouldn't need an op, the "comparison" is implicit (IN).


Line 460:                                     std::vector<KuduValue*>* values);
Taking ownership of the vector by pointer itself is a little awkward because it 
forces callers to allocate it on the heap. Instead, what if we did this:
1. Continue to pass the vector by pointer, but
2. Internally, don't store the pointer; move its contents to a different vector 
owned by the predicate.

You might implement it like this:

  class Bar {
    ...
  }

  class Foo {
   public:
    Foo(vector<Bar*>* belongs_to_caller) {
      belongs_to_foo_.swap(*belongs_to_caller);
    }

   private:
    vector<Bar*> belongs_to_foo_;
  }

In this example, after constructing a Foo, the caller can still access its 
vector (Foo didn't take ownership of it), but the vector is empty. This is both 
safe and performant since the swap probably just swaps internal vector pointers.

I think this will also allow you to simplify the way you store the values 
internally. The new predicate data class could have its own vector (not a 
vector pointer), and use STLDeleteElements() in the destructor. Then you won't 
need to use an AutoReleasePool at all.


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/predicate-test.cc
File src/kudu/client/predicate-test.cc:

Line 288:     { // column IN (<int-list>)
Nit: "value IN (<int-list>)"

Below too.


Line 289:       int count = 0;
Nit: declare count as close to where it's actually used as possible.

Below too.


Line 290:       RemoveDuplicates<T>(&test_values);
You should make a copy of test_values first, so that if someone adds a test 
after this they'll still get duplicates. But, on second thought, creating the 
predicate deduplicates the values, so why deduplicate them first here? If it's 
to get an accurate expected row count, could you constrain the deduplication to 
just that part of the test code?

Below too.


Line 292:       for (T v: test_values) {
Nit: for (T v : test_values) {


Line 305: STLDeleteElements(vals);
        :       delete vals;
Why do you need this? Doesn't the KuduPredicate take ownership of vals and its 
contents? Then the scanner in CountRows take's ownership of that, and deletes 
everything when it goes out of scope?

Below too.


Line 388:       for (string& v: test_values) {
You're not modifying v, so you should iterate by const-ref. On L391 too.

Note that when operating on integers, floats, and other simple data types, 
iterating by value is preferable, because it's probably cheaper to copy the 
value than it is to follow the reference.


Line 393:                           [&] (string& t) { return t == v; });
t should be const-ref.


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate-internal.h
File src/kudu/client/scan_predicate-internal.h:

Line 106:     std::vector<KuduValue*>* val_cloned = new 
std::vector<KuduValue*>();
        :     for (auto val: *vals_) {
        :       val_cloned->push_back(val->Clone());
        :     }
This can be combined:

  std::vector<KuduValue>* val_cloned = new 
std::vector<KuduValue*>(vals_->begin(), vals_->end());


Line 117:   void Dedup();
Nit: add an empty line after this.


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate.cc
File src/kudu/client/scan_predicate.cc:

Line 105:                                          std::vector<KuduValue*>* 
values )
Nit: extra space between values and )

Also you can drop std:: from vector here.


Line 134:   // KuduPredicate::INLIST
What's this comment for?


Line 140: void InListPredicateData::Dedup() {
With the suggestion I gave you for how to handle the incoming values array, it 
would be simpler to dedupe in the constructor like this:

  InListPredicateData::InListPredicateData(ColumnSchema col,
                                           KuduPredicate::ComparisonOp op,
                                           vector<KuduValue*>* values)
    : col_(move(col)),
      op_(op) {
    set<KuduValue*> deduped(values->begin(), values->end());
    vals_.assign(deduped.begin(), deduped.end());
    values->clear();
  }


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/scan_predicate.h
File src/kudu/client/scan_predicate.h:

Line 39:     INLIST
You shouldn't need this; InList implies both a kind of value (a list) and a 
"comparison" (IN).


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/value.cc
File src/kudu/client/value.cc:

Line 208:       return false;
I think a LOG(FATAL) here would be more appropriate; if someone adds a new 
value type, they should add new entries to operator< and operator==.

Below too.


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/client/value.h
File src/kudu/client/value.h:

Line 53:   // Returns true if KuduValue A < KuduValue B
       :   bool operator<(KuduValue& other) const;
       : 
       :   // Returns true if KuduValue A == KuduValue B
       :   bool operator==(KuduValue& other) const;
The google style guide (which Kudu uses) generally discourages operator 
overloading:

https://google.github.io/styleguide/cppguide.html#Operator_Overloading

In this case I think it's OK, but you should follow its advice regarding binary 
operators:

  Prefer to define non-modifying binary operators as non-member functions. If a 
binary operator is defined as a class member, implicit conversions will apply 
to the right-hand argument, but not the left-hand one. It will confuse your 
users if a < b compiles but b < a doesn't.

You can follow Slice as an example.


http://gerrit.cloudera.org:8080/#/c/2986/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

Line 169:         return values_;
Nit: got a stray tab char here.


Line 177:   explicit ColumnPredicate(PredicateType predicate_type,
Nit: explicit is only needed for single-arg constructors.


Line 182:   // Creates a new column predicate.
Nit: modify the two comments to make it clear what the difference in 
constructors are.


Line 226:   std::vector<void*>* values_;
Why does this need to be a pointer? Can't the ColumnPredicate own its own 
vector?


-- 
To view, visit http://gerrit.cloudera.org:8080/2986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I986cb13097f1cf4af2b752f58c4c38f412a6a598
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sameer Abhyankar <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sameer Abhyankar <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to