[jira] [Commented] (DRILL-6028) Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error

ASF GitHub Bot (JIRA) Wed, 20 Dec 2017 15:41:12 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299263#comment-16299263
 ]


ASF GitHub Bot commented on DRILL-6028:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1071#discussion_r158164670
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestUnionDistinct.java ---
    @@ -754,4 +756,37 @@ public void testDrill4147_1() throws Exception {
         }
       }
     
    +  @Test
    +  public void testUnionWithManyColumns() throws Exception {
    --- End diff --
    
    Why would a UNION operator need code for a generated hash table? The union 
matches columns, then iterates over multiple result sets. Where would a 
run-time hash table fit?
    
    Do we need a different unit test that exercises the hash table code?


> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6028
>                 URL: https://issues.apache.org/jira/browse/DRILL-6028
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>             Fix For: 1.13.0
>
>
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error.
> *REPRODUCE*
> File {{1200_columns.csv}}
> {noformat}
> 0,1,2,3...1200
> 0,1,2,3...1200
> {noformat}
> Query
> {noformat}
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> union
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> {noformat}
> Error
> {noformat}
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', 
> Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
>         public boolean isKeyMatchInternalBuild(int incomingRowIdx, int 
> htRowIdx)
>                        ^ (compiler.err.limit.code)
> {noformat}
> *ROOT CAUSE*
> DRILL-4715 added ability to ensure that methods size won't go beyond the 64k 
> limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new 
> block only if # of expressions added hit upper-bound defined by 
> {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in 
> methods hits upper bound we create from call inner method.
> Example: 
> {noformat}
> public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) 
> throws SchemaChangeException {
> // some logic
> return doSetup0(incomingBuild, incomingProbe);
> }
> {noformat}
> During code generation {{ChainedHashTable}} added all code in its methods in 
> one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and 
> {{getHashProbe}} methods contained state and thus could not be split. In 
> these methods hash was generated for each key expression. For the first key 
> seed was 0, subsequent keys hash was generated based on seed from previous 
> key.
> To allow splitting for there methods the following was done:
> 1. Method signatures was changed: added new parameter {{seedValue}}. 
> Initially starting seed value was hard-coded during code generation (set to 
> 0), now it is passed as method parameter.
> 2. Initially hash function call for all keys was transformed into one logical 
> expression which did not allow splitting. Now we create logical expression 
> for each key and thus splitting is possible. New {{seedValue}} parameter is 
> used as seed holder to pass seed value for the next key.
> 3. {{ParameterExpression}} was added to generate reference to method 
> parameter during code generation.
> Code example:
> {noformat}
>     public int getHashBuild(int incomingRowIdx, int seedValue)
>         throws SchemaChangeException
>     {
>         {
>             NullableVarCharHolder out3 = new NullableVarCharHolder();
>             {
>                 out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
>                 if (out3 .isSet == 1) {
>                     out3 .buffer = vv0 .getBuffer();
>                     long startEnd = vv0 
> .getAccessor().getStartEnd((incomingRowIdx));
>                     out3 .start = ((int) startEnd);
>                     out3 .end = ((int)(startEnd >> 32));
>                 }
>             }
>             IntHolder seedValue4 = new IntHolder();
>             seedValue4 .value = seedValue;
>             //---- start of eval portion of hash32 function. ----//
>             IntHolder out5 = new IntHolder();
>             {
>                 final IntHolder out = new IntHolder();
>                 NullableVarCharHolder in = out3;
>                 IntHolder seed = seedValue4;
>                  
> Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
>     if (in.isSet == 0) {
>         out.value = seed.value;
>     } else
>     {
>         out.value = 
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, 
> in.buffer, seed.value);
>     }
> }
>  
>                 out5 = out;
>             }
>             //---- end of eval portion of hash32 function. ----//
>             seedValue = out5 .value;
>    return getHashBuild0((incomingRowIdx), (seedValue));
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6028) Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error

Reply via email to