[
https://issues.apache.org/jira/browse/SPARK-23247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-23247.
---------------------------------
Resolution: Fixed
Fix Version/s: 2.4.0
Issue resolved by pull request 20415
[https://github.com/apache/spark/pull/20415]
> combines Unsafe operations and statistics operations in Scan Data Source
> ------------------------------------------------------------------------
>
> Key: SPARK-23247
> URL: https://issues.apache.org/jira/browse/SPARK-23247
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: caoxuewen
> Assignee: caoxuewen
> Priority: Major
> Fix For: 2.4.0
>
>
> Currently, we scan the execution plan of the data source, first the unsafe
> operation of each row of data, and then re traverse the data for the count of
> rows. In terms of performance, this is not necessary. this PR combines the
> two operations and makes statistics on the number of rows while performing
> the unsafe operation.
> *Before modified,*
> {color:#cc7832}val {color}unsafeRow = rdd.mapPartitionsWithIndexInternal {
> (index{color:#cc7832}, {color}iter) =>
> {color:#cc7832}val {color}proj =
> UnsafeProjection.create({color:#9876aa}schema{color})
> proj.initialize(index)
> {color:#FF0000}iter.map(proj){color}
> }
> {color:#cc7832}val {color}numOutputRows =
> longMetric({color:#6a8759}"numOutputRows"{color})
> unsafeRow.map { r =>
> {color:#FF0000}numOutputRows += {color}{color:#6897bb}{color:#FF0000}1{color}
> {color} r
> }
> *After modified,*
> val numOutputRows = longMetric("numOutputRows")
> rdd.mapPartitionsWithIndexInternal { (index, iter) =>
> val proj = UnsafeProjection.create(schema)
> proj.initialize(index)
> iter.map( r => {
> {color:#FF0000} numOutputRows += 1{color}
> {color:#FF0000} proj(r){color}
> })
> }
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]