Cristian O created SPARK-5863:
---------------------------------

             Summary: Performance regression in Spark SQL/Parquet due to 
ScalaReflection.convertRowToScala
                 Key: SPARK-5863
                 URL: https://issues.apache.org/jira/browse/SPARK-5863
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.1, 1.2.0
            Reporter: Cristian O


Was doing some perf testing on reading parquet files and noticed that moving 
from Spark 1.1 to 1.2 the performance is 3x worse. In the profiler the culprit 
showed up as being in ScalaReflection.convertRowToScala.

Particularly this zip is the issue:

{code}
r.toSeq.zip(schema.fields.map(_.dataType))
{code}

I see there's a comment on that currently that this is slow but it wasn't 
fixed. This actually produces a 3x degradation in parquet read performance, at 
least in my test case.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to