Skye Wanderman-Milne has posted comments on this change.

Change subject: IMPALA-2548: Codegen Tuple::MaterializeExprs() and use in TopN 
node
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 395:   static const char* LLVM_CLASS_NAME;
> does everybody need access to this? if not, make private and add friends
Can you explain the underlying principle here? This is used by codegen 
functions to get the TupleDescriptor LLVM type (and similarly we have a 
LLVM_CLASS_NAME for every class that needs to be referenced in an IRBuilder 
codegen'd function). According to 
https://google.github.io/styleguide/cppguide.html#Friends, "Friends extend, but 
do not break, the encapsulation boundary of a class". My understanding is that 
LLVM_CLASS_NAME users don't qualify since they're not implementing 
TupleDescriptor functionality, but I'm not super familiar with friend class 
best practices.


http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

Line 208:     SlotDescriptor* slot_desc = desc.slots()[i];
> instead of generating ir by hand below, could you do this by 
I implemented a quick prototype of the interpreted functionality so you can see 
what it looks like before I implement the rest and clean up what I don't need 
anymore. Here's how it works:

1. The loop body is broken out into a function for each type. We need this for 
cross-compiling so we can use the specific ExprContext::Get*Val() methods 
instead of the generic ExprContext::GetValue() method. This is done with a 
macro. So far I've only plumbed through int and string, but this would be 
extended to all types. This also requires having a RawValue::Write() function 
for each AnyVal subclass.

2. MaterializeExprs() has a switch statement in the for loop that calls the 
correct body function.

3. (NYI) I'm not sure how to write a generic codegen helper function to 
manually call the body function with the correct arguments, but maybe you can 
help me think of a way. I'm currently planning to write a new 
CodegenMaterializeExprs() function that loops over all the slot descriptors, 
converts the relevant slot parameters to constant LLVM values, and generates a 
call to the correct loop body function passing in the constant params. We'll 
have to generate constant ColumnType and Tuple::NullIndicatorOffset structs to 
pass in, but luckily I already have a patch to do ColumnType and NullIndicator 
should be similar. This function will also generate the initial memset call, 
which is the only non-loop-body work, so we don't have to cross-compile 
MaterializeExprs(). This function will also replace the Get*Val calls with the 
codegen'd versions of these functions.

What do you think? Even though this makes the C++ code more verbose, I think 
it's an improvement since we'll end up cross-compiling the AnyVal::Write() 
calls instead of reimplementing them using the IRBuilder in 
CodegenAnyVal::WriteToSlot(). However, I'm not sure this will make 
CodegenMaterializeExprs() any simpler, and it may actually get a little longer 
and more nuanced.

Feel free to grab me if you wanna discuss in person.


http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

Line 123:   template <bool collect_string_vals, bool no_pool>
> what is the point of templatizing this? if there's a codegen'd version of i
It's to differentiate the pool vs. the no pool function calls for when we 
replace the function calls during codegen. Having the template param gives us 
two different symbols for the two different cases. I'll expand the comment to 
explain this more.


Line 143:   static const char* MATERIALIZE_EXPRS_SYMBOL;
> move to private (and make whoever needs this a friend, hopefully that'll be
See my other comment, I think this case is similar.


-- 
To view, visit http://gerrit.cloudera.org:8080/1901
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib422a8d50303c21c6a228675157bf867e8619444
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-HasComments: Yes

Reply via email to