[
https://issues.apache.org/jira/browse/DRILL-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kunal Khatua resolved DRILL-6211.
---------------------------------
Resolution: Fixed
As part of Lateral Unnest feature commits, this has been verified.
|| Selectivity || *Drill-1.13* (ms) || %Used by SVR || Est SVR Time (ms) ||
*Drill-1.14* (ms) || %Used by SVR || Est SVR Time (ms) ||
| 0% | 5935 | 0.08% | 4.75 | 4,584
| 0.14% | 6.42 |
| 10% | 6665 | 7.51% | 500.54 | 4,972
| 0.12% | 5.97 |
| 20% | 7512 | 13.22% | 993.09 | 5,187
| 0.14% | 7.26 |
| 30% | 7814 | 19.03% | 1487.00 | 5,432
| 0.20% | 10.86 |
| 40% | 8827 | 22.06% | 1947.24 | 5,579
| 0.16% | 8.93 |
| 50% | 9499 | 25.36% | 2408.95 | 5,739
| 0.17% | 9.76 |
| 60% | 10108 | 28.63% | 2893.92 | 5,823
| 0.18% | 10.48 |
| 70% | 10624 | 31.47% | 3343.37 | 6,096
| 0.19% | 11.58 |
| 80% | 11342 | 33.58% | 3808.64 | 6,266
| 0.20% | 12.53 |
| 90% | 12088 | 35.40% | 4279.15 | 6,324
| 0.21% | 13.28 |
| 100% | 12741 | 37.42% | 4767.68 | 6,250
| 0.23% | 14.38 |
> Optimizations for SelectionVectorRemover
> -----------------------------------------
>
> Key: DRILL-6211
> URL: https://issues.apache.org/jira/browse/DRILL-6211
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen
> Reporter: Kunal Khatua
> Assignee: Karthikeyan Manivannan
> Priority: Major
> Fix For: 1.15.0
>
> Attachments: 255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill,
> 255d2664-2418-19e0-00ea-2076a06572a2.sys.drill,
> 255d2682-8481-bed0-fc22-197a75371c04.sys.drill,
> 255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill,
> 255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill
>
>
> Currently, when a SelectionVectorRemover receives a record batch from an
> upstream operator (like a Filter), it immediately starts copying over records
> into a new outgoing batch.
> It can be worthwhile if the RecordBatch can be enriched with some additional
> summary statistics about the attached SelectionVector, such as
> # number of records that need to be removed/copied
> # total number of records in the record-batch
> The benefit of this would be that in extreme cases, if *all* the records in a
> batch need to be either truncated or copies, the SelectionVectorRemover can
> simply drop the record-batch or simply forward it to the next downstream
> operator.
> While the extreme cases of simply dropping the batch kind of works (because
> there is no overhead in copying), for cases where the record batch should
> pass through, the overhead remains (and is actually more than 35% of the
> time, if you discount for the streaming agg cost within the tests).
> Here are the statistics of having such an optimization
> ||Selectivity||Query Time||%Time used by SVR||Time||Profile||
> |0%|6.996|0.13%|0.0090948|[^255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill]|
> |10%|7.836|7.97%|0.6245292|[^255d2682-8481-bed0-fc22-197a75371c04.sys.drill]|
> |50%|11.225|25.59%|2.8724775|[^255d2664-2418-19e0-00ea-2076a06572a2.sys.drill]|
> |90%|14.966|33.91%|5.0749706|[^255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill]|
> |100%|19.003|35.73%|6.7897719|[^255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill]|
> To summarize, the SVR should avoid creating new batches as much as possible.
> A more generic (non-trivial) optimization should take into account the fact
> that multiple batches emitted can be coalesced, but we don't currently have
> test metrics for that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)