[ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931381#comment-15931381
 ] 

ASF GitHub Bot commented on DRILL-5125:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/704#discussion_r106789976
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/GenericSV4Copier.java
 ---
    @@ -0,0 +1,67 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.svremover;
    +
    +import org.apache.drill.exec.ops.FragmentContext;
    +import org.apache.drill.exec.record.RecordBatch;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.ValueVector;
    +
    +/**
    + * Generic selection vector 4 copier implementation that can
    + * be used in place of the generated version. Relies on a
    + * virtual function in each value vector to choose the proper
    + * implementation. Tests suggest that this version performs
    + * better than the generated version for queries with many columns.
    + */
    +
    +public class GenericSV4Copier extends CopierTemplate4 {
    +
    +  private ValueVector[] vvOut;
    +  private ValueVector[][] vvIn;
    +
    +  @SuppressWarnings("unused")
    +  @Override
    +  public void doSetup(FragmentContext context, RecordBatch incoming,
    +      RecordBatch outgoing) {
    --- End diff --
    
    Fixed


> Provide option to use generic code for sv remover
> -------------------------------------------------
>
>                 Key: DRILL-5125
>                 URL: https://issues.apache.org/jira/browse/DRILL-5125
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00    Screen
> 00-01      ProjectAllowDup(*=[$0], rn=[$1])
> 00-02        Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($2, 10)])
> 00-05              Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$1], dir0=[ASC])
> 00-08                    Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09                      Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable and call to the (non-virtual) 
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced 
> by a “generic” class and virtual functions in the vector classes to choose 
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run 
> savings. Here are the last 8 runs (out of 10), with code cached turned off 
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated 
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained 
> on a version of the code that already had various optimizations described in 
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line 
> statements (an unwound loop) to be faster than a loop with 243 iterations. In 
> addition, the generic version uses an array in place of ~500 variables, and a 
> virtual function call rather than in-line, type-specific calls. One would 
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no 
> compile cost is payed for either code path (because the test excludes the 
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In 
> this instance, the array/loop/virtual function version is ~260 ms faster 
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and 
> compile step can be skipped, and this query will go faster. Plus, since the 
> change removes generated classes from the code cache, there is more room for 
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in 
> comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to