[
https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301275#comment-16301275
]
ASF GitHub Bot commented on DRILL-6028:
---------------------------------------
Github user arina-ielchiieva commented on the issue:
https://github.com/apache/drill/pull/1071
@paul-rogers thanks for the code review. Your approach seems to be worth
trying. I have created Jira for enhancement [1].
[1] https://issues.apache.org/jira/browse/DRILL-6052
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code
> too large" error
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-6028
> URL: https://issues.apache.org/jira/browse/DRILL-6028
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.10.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: HashTableGen5_for_1200_columns_AFTER.java,
> HashTableGen5_for_1200_columns_BEFORE.java,
> HashTableGen5_for_40_columns_AFTER.java,
> HashTableGen5_for_40_columns_BEFORE.java
>
>
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code
> too large" error.
> *REPRODUCE*
> File {{1200_columns.csv}}
> {noformat}
> 0,1,2,3...1200
> 0,1,2,3...1200
> {noformat}
> Query
> {noformat}
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> union
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> {noformat}
> Error
> {noformat}
> Error: SYSTEM ERROR: CompileException: File
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]',
> Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
> public boolean isKeyMatchInternalBuild(int incomingRowIdx, int
> htRowIdx)
> ^ (compiler.err.limit.code)
> {noformat}
> *ROOT CAUSE*
> DRILL-4715 added ability to ensure that methods size won't go beyond the 64k
> limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new
> block only if # of expressions added hit upper-bound defined by
> {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in
> methods hits upper bound we create from call inner method.
> Example:
> {noformat}
> public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe)
> throws SchemaChangeException {
> // some logic
> return doSetup0(incomingBuild, incomingProbe);
> }
> {noformat}
> During code generation {{ChainedHashTable}} added all code in its methods in
> one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and
> {{getHashProbe}} methods contained state and thus could not be split. In
> these methods hash was generated for each key expression. For the first key
> seed was 0, subsequent keys hash was generated based on seed from previous
> key.
> To allow splitting for there methods the following was done:
> 1. Method signatures was changed: added new parameter {{seedValue}}.
> Initially starting seed value was hard-coded during code generation (set to
> 0), now it is passed as method parameter.
> 2. Initially hash function call for all keys was transformed into one logical
> expression which did not allow splitting. Now we create logical expression
> for each key and thus splitting is possible. New {{seedValue}} parameter is
> used as seed holder to pass seed value for the next key.
> 3. {{ParameterExpression}} was added to generate reference to method
> parameter during code generation.
> Code example:
> {noformat}
> public int getHashBuild(int incomingRowIdx, int seedValue)
> throws SchemaChangeException
> {
> {
> NullableVarCharHolder out3 = new NullableVarCharHolder();
> {
> out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
> if (out3 .isSet == 1) {
> out3 .buffer = vv0 .getBuffer();
> long startEnd = vv0
> .getAccessor().getStartEnd((incomingRowIdx));
> out3 .start = ((int) startEnd);
> out3 .end = ((int)(startEnd >> 32));
> }
> }
> IntHolder seedValue4 = new IntHolder();
> seedValue4 .value = seedValue;
> //---- start of eval portion of hash32 function. ----//
> IntHolder out5 = new IntHolder();
> {
> final IntHolder out = new IntHolder();
> NullableVarCharHolder in = out3;
> IntHolder seed = seedValue4;
>
> Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
> if (in.isSet == 0) {
> out.value = seed.value;
> } else
> {
> out.value =
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end,
> in.buffer, seed.value);
> }
> }
>
> out5 = out;
> }
> //---- end of eval portion of hash32 function. ----//
> seedValue = out5 .value;
> return getHashBuild0((incomingRowIdx), (seedValue));
> }
> {noformat}
> Examples of code generation:
> {{HashTableGen5_for_40_columns_BEFORE.java}} - code compiles
> {{HashTableGen5_for_40_columns_AFTER.java}} - code compiles
> {{HashTableGen5_for_1200_columns_BEFORE.java}} - error during compilation,
> method too large
> {{HashTableGen5_for_1200_columns_AFTER.java}} - code compiles since methods
> were split
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)