[
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950050#comment-15950050
]
ASF GitHub Bot commented on DRILL-5125:
---------------------------------------
Github user jinfengni commented on a diff in the pull request:
https://github.com/apache/drill/pull/704#discussion_r109051708
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java
---
@@ -374,6 +374,19 @@ public void copyFromSafe(int fromIndex, int thisIndex,
Nullable${minor.class}Vec
values.copyFromSafe(fromIndex, thisIndex, from.values);
}
+ @Override
+ public void copyEntry(int toIndex, ValueVector from, int fromIndex) {
+ Nullable${minor.class}Vector fromVector =
(Nullable${minor.class}Vector) from;
+ <#if type.major == "VarLen">
+
+ // This method is to be called only for loading the vector
+ // sequentially, so there should be no empties to fill.
+
+ </#if>
+ bits.copyFromSafe(fromIndex, toIndex, fromVector.bits);
+ values.copyFromSafe(fromIndex, toIndex, fromVector.values);
--- End diff --
IntVector.copyEntry() will call DrillBuf.getBytes(....), yet
NullableIntVector.copyEntry() will call IntVector.copyFromSafe() which seems to
go through different code path. Is there a reason for such difference?
> Provide option to use generic code for sv remover
> -------------------------------------------------
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Labels: ready-to-commit
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields.
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00 Screen
> 00-01 ProjectAllowDup(*=[$0], rn=[$1])
> 00-02 Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[=($2, 10)])
> 00-05 Window(window#0=[window(partition {} order by [1] rows
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06 SelectionVectorRemover
> 00-07 Sort(sort0=[$1], dir0=[ASC])
> 00-08 Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09 Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes
> (2159 ms).
> The classes themselves are very simple: create member variables for 486
> vectors (2 x column count), and call a method on each to do the copy. The
> only type-specific work is the member variable and call to the (non-virtual)
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced
> by a “generic” class and virtual functions in the vector classes to choose
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run
> savings. Here are the last 8 runs (out of 10), with code cached turned off
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained
> on a version of the code that already had various optimizations described in
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line
> statements (an unwound loop) to be faster than a loop with 243 iterations. In
> addition, the generic version uses an array in place of ~500 variables, and a
> virtual function call rather than in-line, type-specific calls. One would
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no
> compile cost is payed for either code path (because the test excludes the
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In
> this instance, the array/loop/virtual function version is ~260 ms faster
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and
> compile step can be skipped, and this query will go faster. Plus, since the
> change removes generated classes from the code cache, there is more room for
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in
> comments.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)