[jira] [Commented] (DRILL-3876) flatten() should not require a subsequent project to strip columns that aren't required

Jason Altekruse (JIRA) Mon, 05 Oct 2015 15:52:38 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944191#comment-14944191
 ]


Jason Altekruse commented on DRILL-3876:
----------------------------------------

While there might be an argument to include projection in other operators, this 
isn't actually the issue with flatten. When flatten was implemented, we created 
several planning rules to work around overall limitations in project and the 
expression materialization system, as well as enable nested flattens without 
complicating the operator.

The code that needs to be fixed to remove the extra projection column is in 
SplitUpComplexExpressions. This new rule was necessary to allow for a function 
to pass a complex output into another function.

We don't have many functions that return complex outputs, but one possible use 
of the function could be something like this:

select convert_to( kvgen( a_map), 'JSON') from dfs.`/table.json`

The output of kvgen is a repeated map, and convert_to(field, 'JSON') expects a 
complex object as input. Drill does not currently support passing the complex 
output from one expression (in the form of a FieldWriter) as a direct input to 
another expression, at least not within a single project. The data must be 
serialized to a vector and then a FieldReader must be created to feed data into 
another function expecting the complex input.

To enable this functionality, without enhancing project, the 
SplitUpComplexExpressions rule was added to break up each complex expression 
into its own project. This is currently acting inefficiently and assuming that 
an extra copy of the incoming data may need to be kept around for input into a 
different expression. For most of planning, flatten is treated like a complex 
expression in a project. Right after this SplitUpComplexExpressions is run, 
there is a separate rule that turns a project with a single flatten in it into 
a combination of a project and a flatten operation.

Example:
select flatten(a_list), a_list[0] from table;

Here the best thing to do would be to evaluate the indexing into the list 
before flatten. Right now the rule is just making an extra copy of the original 
list, assuming an evaluation like this might need to happen later. This is even 
happening where there are no other expressions in the project, which is just a 
complete waste. There are a couple of simple fixes, the rule could do nothing 
in the case where only a single expression is present. The right thing to do is 
to enhance the rule to look for other usages of the input to a complex 
expression amongst other expressions the the project, if none are found there 
is no need for the extra copy of the data.

You can see the desired "correct" behavior on a a simple flatten by actually 
just commenting out the rule in DefaultSqlHandler line 342. This makes a few of 
the tests fail that are relying on the rule, but it does make the basic case 
work.


> flatten() should not require a subsequent project to strip columns that 
> aren't required
> ---------------------------------------------------------------------------------------
>
>                 Key: DRILL-3876
>                 URL: https://issues.apache.org/jira/browse/DRILL-3876
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Chris Westin
>            Assignee: Chris Westin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3876) flatten() should not require a subsequent project to strip columns that aren't required

Reply via email to