[
https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299263#comment-16299263
]
ASF GitHub Bot commented on DRILL-6028:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1071#discussion_r158164670
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/TestUnionDistinct.java ---
@@ -754,4 +756,37 @@ public void testDrill4147_1() throws Exception {
}
}
+ @Test
+ public void testUnionWithManyColumns() throws Exception {
--- End diff --
Why would a UNION operator need code for a generated hash table? The union
matches columns, then iterates over multiple result sets. Where would a
run-time hash table fit?
Do we need a different unit test that exercises the hash table code?
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code
> too large" error
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-6028
> URL: https://issues.apache.org/jira/browse/DRILL-6028
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.10.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Fix For: 1.13.0
>
>
> Allow splitting generated code in ChainedHashTable into blocks to avoid "code
> too large" error.
> *REPRODUCE*
> File {{1200_columns.csv}}
> {noformat}
> 0,1,2,3...1200
> 0,1,2,3...1200
> {noformat}
> Query
> {noformat}
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> union
> select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
> {noformat}
> Error
> {noformat}
> Error: SYSTEM ERROR: CompileException: File
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]',
> Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
> public boolean isKeyMatchInternalBuild(int incomingRowIdx, int
> htRowIdx)
> ^ (compiler.err.limit.code)
> {noformat}
> *ROOT CAUSE*
> DRILL-4715 added ability to ensure that methods size won't go beyond the 64k
> limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new
> block only if # of expressions added hit upper-bound defined by
> {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in
> methods hits upper bound we create from call inner method.
> Example:
> {noformat}
> public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe)
> throws SchemaChangeException {
> // some logic
> return doSetup0(incomingBuild, incomingProbe);
> }
> {noformat}
> During code generation {{ChainedHashTable}} added all code in its methods in
> one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and
> {{getHashProbe}} methods contained state and thus could not be split. In
> these methods hash was generated for each key expression. For the first key
> seed was 0, subsequent keys hash was generated based on seed from previous
> key.
> To allow splitting for there methods the following was done:
> 1. Method signatures was changed: added new parameter {{seedValue}}.
> Initially starting seed value was hard-coded during code generation (set to
> 0), now it is passed as method parameter.
> 2. Initially hash function call for all keys was transformed into one logical
> expression which did not allow splitting. Now we create logical expression
> for each key and thus splitting is possible. New {{seedValue}} parameter is
> used as seed holder to pass seed value for the next key.
> 3. {{ParameterExpression}} was added to generate reference to method
> parameter during code generation.
> Code example:
> {noformat}
> public int getHashBuild(int incomingRowIdx, int seedValue)
> throws SchemaChangeException
> {
> {
> NullableVarCharHolder out3 = new NullableVarCharHolder();
> {
> out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
> if (out3 .isSet == 1) {
> out3 .buffer = vv0 .getBuffer();
> long startEnd = vv0
> .getAccessor().getStartEnd((incomingRowIdx));
> out3 .start = ((int) startEnd);
> out3 .end = ((int)(startEnd >> 32));
> }
> }
> IntHolder seedValue4 = new IntHolder();
> seedValue4 .value = seedValue;
> //---- start of eval portion of hash32 function. ----//
> IntHolder out5 = new IntHolder();
> {
> final IntHolder out = new IntHolder();
> NullableVarCharHolder in = out3;
> IntHolder seed = seedValue4;
>
> Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
> if (in.isSet == 0) {
> out.value = seed.value;
> } else
> {
> out.value =
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end,
> in.buffer, seed.value);
> }
> }
>
> out5 = out;
> }
> //---- end of eval portion of hash32 function. ----//
> seedValue = out5 .value;
> return getHashBuild0((incomingRowIdx), (seedValue));
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)