[jira] [Resolved] (DRILL-6211) Optimizations for SelectionVectorRemover

Kunal Khatua (JIRA) Mon, 08 Oct 2018 11:07:19 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kunal Khatua resolved DRILL-6211.
---------------------------------
    Resolution: Fixed

As part of Lateral Unnest feature commits, this has been verified.

|| Selectivity || *Drill-1.13* (ms) || %Used by SVR || Est SVR Time (ms) || 
*Drill-1.14* (ms) || %Used by SVR || Est SVR Time (ms) || 
| 0%          | 5935            | 0.08%        | 4.75         | 4,584           
| 0.14%        | 6.42         | 
| 10%         | 6665            | 7.51%        | 500.54       | 4,972           
| 0.12%        | 5.97         | 
| 20%         | 7512            | 13.22%       | 993.09       | 5,187           
| 0.14%        | 7.26         | 
| 30%         | 7814            | 19.03%       | 1487.00      | 5,432           
| 0.20%        | 10.86        | 
| 40%         | 8827            | 22.06%       | 1947.24      | 5,579           
| 0.16%        | 8.93         | 
| 50%         | 9499            | 25.36%       | 2408.95      | 5,739           
| 0.17%        | 9.76         | 
| 60%         | 10108           | 28.63%       | 2893.92      | 5,823           
| 0.18%        | 10.48        | 
| 70%         | 10624           | 31.47%       | 3343.37      | 6,096           
| 0.19%        | 11.58        | 
| 80%         | 11342           | 33.58%       | 3808.64      | 6,266           
| 0.20%        | 12.53        | 
| 90%         | 12088           | 35.40%       | 4279.15      | 6,324           
| 0.21%        | 13.28        | 
| 100%        | 12741           | 37.42%       | 4767.68      | 6,250           
| 0.23%        | 14.38        | 


> Optimizations for SelectionVectorRemover 
> -----------------------------------------
>
>                 Key: DRILL-6211
>                 URL: https://issues.apache.org/jira/browse/DRILL-6211
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Kunal Khatua
>            Assignee: Karthikeyan Manivannan
>            Priority: Major
>             Fix For: 1.15.0
>
>         Attachments: 255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill, 
> 255d2664-2418-19e0-00ea-2076a06572a2.sys.drill, 
> 255d2682-8481-bed0-fc22-197a75371c04.sys.drill, 
> 255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill, 
> 255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill
>
>
> Currently, when a SelectionVectorRemover receives a record batch from an 
> upstream operator (like a Filter), it immediately starts copying over records 
> into a new outgoing batch.
>  It can be worthwhile if the RecordBatch can be enriched with some additional 
> summary statistics about the attached SelectionVector, such as
>  # number of records that need to be removed/copied
>  # total number of records in the record-batch
> The benefit of this would be that in extreme cases, if *all* the records in a 
> batch need to be either truncated or copies, the SelectionVectorRemover can 
> simply drop the record-batch or simply forward it to the next downstream 
> operator.
> While the extreme cases of simply dropping the batch kind of works (because 
> there is no overhead in copying), for cases where the record batch should 
> pass through, the overhead remains (and is actually more than 35% of the 
> time, if you discount for the streaming agg cost within the tests).
> Here are the statistics of having such an optimization
> ||Selectivity||Query Time||%Time used by SVR||Time||Profile||
> |0%|6.996|0.13%|0.0090948|[^255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill]|
> |10%|7.836|7.97%|0.6245292|[^255d2682-8481-bed0-fc22-197a75371c04.sys.drill]|
> |50%|11.225|25.59%|2.8724775|[^255d2664-2418-19e0-00ea-2076a06572a2.sys.drill]|
> |90%|14.966|33.91%|5.0749706|[^255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill]|
> |100%|19.003|35.73%|6.7897719|[^255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill]|
> To summarize, the SVR should avoid creating new batches as much as possible.
> A more generic (non-trivial) optimization should take into account the fact 
> that multiple batches emitted can be coalesced, but we don't currently have 
> test metrics for that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6211) Optimizations for SelectionVectorRemover

Reply via email to