[jira] [Resolved] (SPARK-5863) Improve performance of convertToScala codepath.

Michael Armbrust (JIRA) Sun, 12 Apr 2015 11:48:54 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust resolved SPARK-5863.
-------------------------------------
    Resolution: Fixed

[[email protected]] a little more background:  the reason for the 
performance regression is we were not properly doing the conversion in previous 
releases.  Doing it correctly is more important in 1.2 because we added more 
internal types to the execution layer (Decimal).  I think the correct solution 
performance wise is done in [SPARK-6620] but this is a pretty huge change that 
we would not want to backport to a maintenance branch.  It is not clear to me 
that making things into iterators would fix the performance problem or that 
there is a simple solution at all.  So since we have a fix for the next release 
I'm going to close this issue.

If you want to investigate further and find a surgical fix that does improve 
performance in a benchmark, please feel free to reopen.

> Improve performance of convertToScala codepath.
> -----------------------------------------------
>
>                 Key: SPARK-5863
>                 URL: https://issues.apache.org/jira/browse/SPARK-5863
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.2.0, 1.2.1
>            Reporter: Cristian
>            Priority: Critical
>
> Was doing some perf testing on reading parquet files and noticed that moving 
> from Spark 1.1 to 1.2 the performance is 3x worse. In the profiler the 
> culprit showed up as being in ScalaReflection.convertRowToScala.
> Particularly this zip is the issue:
> {code}
> r.toSeq.zip(schema.fields.map(_.dataType))
> {code}
> I see there's a comment on that currently that this is slow but it wasn't 
> fixed. This actually produces a 3x degradation in parquet read performance, 
> at least in my test case.
> Edit: the map is part of the issue as well. This whole code block is in a 
> tight loop and allocates a new ListBuffer that needs to grow for each 
> transformation. A possible solution is to change to using seq.view which 
> would allocate iterators instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-5863) Improve performance of convertToScala codepath.

Reply via email to