[jira] [Commented] (DRILL-6028) Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error

ASF GitHub Bot (JIRA) Fri, 22 Dec 2017 03:23:02 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301275#comment-16301275
 ]


ASF GitHub Bot commented on DRILL-6028:
---------------------------------------

Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1071
  
    @paul-rogers thanks for the code review. Your approach seems to be worth 
trying. I have created Jira for enhancement [1].
    
    [1] https://issues.apache.org/jira/browse/DRILL-6052


> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6028
>                 URL: https://issues.apache.org/jira/browse/DRILL-6028
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>              Labels: ready-to-commit
>             Fix For: 1.13.0
>
>         Attachments: HashTableGen5_for_1200_columns_AFTER.java, 
> HashTableGen5_for_1200_columns_BEFORE.java, 
> HashTableGen5_for_40_columns_AFTER.java, 
> HashTableGen5_for_40_columns_BEFORE.java
>
>
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code 
> too large" error.
> *REPRODUCE*
> File {{1200_columns.csv}}
> {noformat}
> 0,1,2,3...1200
> 0,1,2,3...1200
> {noformat}
> Query
> {noformat}
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> union
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> {noformat}
> Error
> {noformat}
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', 
> Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
>         public boolean isKeyMatchInternalBuild(int incomingRowIdx, int 
> htRowIdx)
>                        ^ (compiler.err.limit.code)
> {noformat}
> *ROOT CAUSE*
> DRILL-4715 added ability to ensure that methods size won't go beyond the 64k 
> limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new 
> block only if # of expressions added hit upper-bound defined by 
> {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in 
> methods hits upper bound we create from call inner method.
> Example: 
> {noformat}
> public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) 
> throws SchemaChangeException {
> // some logic
> return doSetup0(incomingBuild, incomingProbe);
> }
> {noformat}
> During code generation {{ChainedHashTable}} added all code in its methods in 
> one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and 
> {{getHashProbe}} methods contained state and thus could not be split. In 
> these methods hash was generated for each key expression. For the first key 
> seed was 0, subsequent keys hash was generated based on seed from previous 
> key.
> To allow splitting for there methods the following was done:
> 1. Method signatures was changed: added new parameter {{seedValue}}. 
> Initially starting seed value was hard-coded during code generation (set to 
> 0), now it is passed as method parameter.
> 2. Initially hash function call for all keys was transformed into one logical 
> expression which did not allow splitting. Now we create logical expression 
> for each key and thus splitting is possible. New {{seedValue}} parameter is 
> used as seed holder to pass seed value for the next key.
> 3. {{ParameterExpression}} was added to generate reference to method 
> parameter during code generation.
> Code example:
> {noformat}
>     public int getHashBuild(int incomingRowIdx, int seedValue)
>         throws SchemaChangeException
>     {
>         {
>             NullableVarCharHolder out3 = new NullableVarCharHolder();
>             {
>                 out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
>                 if (out3 .isSet == 1) {
>                     out3 .buffer = vv0 .getBuffer();
>                     long startEnd = vv0 
> .getAccessor().getStartEnd((incomingRowIdx));
>                     out3 .start = ((int) startEnd);
>                     out3 .end = ((int)(startEnd >> 32));
>                 }
>             }
>             IntHolder seedValue4 = new IntHolder();
>             seedValue4 .value = seedValue;
>             //---- start of eval portion of hash32 function. ----//
>             IntHolder out5 = new IntHolder();
>             {
>                 final IntHolder out = new IntHolder();
>                 NullableVarCharHolder in = out3;
>                 IntHolder seed = seedValue4;
>                  
> Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
>     if (in.isSet == 0) {
>         out.value = seed.value;
>     } else
>     {
>         out.value = 
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, 
> in.buffer, seed.value);
>     }
> }
>  
>                 out5 = out;
>             }
>             //---- end of eval portion of hash32 function. ----//
>             seedValue = out5 .value;
>    return getHashBuild0((incomingRowIdx), (seedValue));
> }
> {noformat}
> Examples of code generation:
> {{HashTableGen5_for_40_columns_BEFORE.java}} - code compiles
> {{HashTableGen5_for_40_columns_AFTER.java}} - code compiles
> {{HashTableGen5_for_1200_columns_BEFORE.java}} - error during compilation, 
> method too large
> {{HashTableGen5_for_1200_columns_AFTER.java}} - code compiles since methods 
> were split



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6028) Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error

Reply via email to