[ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566323#comment-16566323
 ] 

ASF GitHub Bot commented on DRILL-6101:
---------------------------------------

paul-rogers commented on issue #1414: DRILL-6101: Optimized implicit columns 
handling within scanner
URL: https://github.com/apache/drill/pull/1414#issuecomment-409800290
 
 
   As it turns out, optimized processing for implicit columns is available in 
the long-stalled result set loader pull requests. That code grabs only the data 
columns (which it packs optimally into a result set), then adds implicit 
columns in a later step, avoiding the thrashing that can occur if the implicit 
columns are populated concurrent with incoming data.
   
   What is really needed is for the planner to recognize the implicit columns 
and project them separately from the data. That is `SELECT *, filename` should 
result in projecting `*` from the table, and only the `filename` implicit 
column. Else, the best we can do is generate the columns, then throw them away, 
which is obviously suboptimal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Optimize Implicit Columns Processing
> ------------------------------------
>
>                 Key: DRILL-6101
>                 URL: https://issues.apache.org/jira/browse/DRILL-6101
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.12.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Critical
>              Labels: pull-request-available
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to