[
https://issues.apache.org/jira/browse/SPARK-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-16331.
---------------------------------
Resolution: Fixed
Assignee: Hiroshi Inoue
Fix Version/s: 2.1.0
> [SQL] Reduce code generation time
> ----------------------------------
>
> Key: SPARK-16331
> URL: https://issues.apache.org/jira/browse/SPARK-16331
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Hiroshi Inoue
> Assignee: Hiroshi Inoue
> Fix For: 2.1.0
>
>
> During the code generation, a {{LocalRelation}} often has a huge {{Vector}}
> object as {{data}}. In the simple example below, a {{LocalRelation}} has a
> Vector with 1000000 elements of {{UnsafeRow}}.
> {quote}
> val numRows = 1000000
> val ds = (1 to numRows).toDS().persist()
> benchmark.addCase("filter+reduce") { iter =>
> ds.filter(a => (a & 1) == 0).reduce(_ + _)
> }
> {quote}
> At {{TreeNode.transformChildren}}, all elements of the vector is
> unnecessarily iterated to check whether any children exist in the vector
> since {{Vector}} is Traversable. This part significantly increases code
> generation time.
> This patch avoids this overhead by checking the number of children before
> iterating all elements; {{LocalRelation}} does not have children since it
> extends {{LeafNode}}.
> The performance of the above example
> {quote}
> without this patch
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
> Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
> compilationTime: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> filter+reduce 4426 / 4533 0.2
> 4426.0 1.0X
> with this patch
> compilationTime: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> filter+reduce 3117 / 3391 0.3
> 3116.6 1.0X
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]