[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210395#comment-16210395
 ] 

ASF GitHub Bot commented on DRILL-5879:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1001#discussion_r145577107
  
    --- Diff: exec/vector/src/main/codegen/templates/ValueHolders.java ---
    @@ -33,34 +33,40 @@
      * This class is generated using freemarker and the ${.template_name} 
template.
      */
     public final class ${className} implements ValueHolder{
    -  
    +
       public static final MajorType TYPE = 
Types.${mode.name?lower_case}(MinorType.${minor.class?upper_case});
    +  public MajorType getType() {return TYPE;}
     
         <#if mode.name == "Repeated">
    -    
    +
         /** The first index (inclusive) into the Vector. **/
         public int start;
    -    
    +
         /** The last index (exclusive) into the Vector. **/
         public int end;
    -    
    +
         /** The Vector holding the actual values. **/
         public ${minor.class}Vector vector;
    -    
    +
         <#else>
         public static final int WIDTH = ${type.width};
    -    
    +
         <#if mode.name == "Optional">public int isSet;</#if>
         <#assign fields = minor.fields!type.fields />
         <#list fields as field>
         public ${field.type} ${field.name};
         </#list>
    -    
    +
    +    <#if minor.class == "VarChar">
    +    // -1: unknown, 0: not ascii, 1: is ascii
    +    public int asciiMode = -1;
    --- End diff --
    
    Drill is complex, nothing is as it seems. Drill has a very elaborate 
mechanism to rewrite byte codes to do scalar replacement. That is, holders are 
added to the Java source during code generation, but then are removed during 
byte code transforms. (Yes, that is silly given that Java 8 does a fine job on 
its own; but it is how Drill works today...)
    
    Will adding this variable change that behavior? Will the scalar replacement 
logic know to not replace this object? If so, will that hurt performance, or 
with the JVM go ahead and figure it out on its own? Is all the code that sets 
the variable inlined via code generation so that scalar replacement is possible?
    
    Yes, indeed, nothing in Drill is as simple as it seems...


> Optimize "Like" operator
> ------------------------
>
>                 Key: DRILL-5879
>                 URL: https://issues.apache.org/jira/browse/DRILL-5879
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>         Environment: * 
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> Query: select <column-list> from <table> where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`<source>` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes    : 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to