[ 
https://issues.apache.org/jira/browse/SPARK-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654696#comment-14654696
 ] 

Wenchen Fan commented on SPARK-9604:
------------------------------------

there is a known issue: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala#L156-L161

I used a quick solution to turn unsafe array/map to safe ones, which is very 
efficient. The reason is that we may have UnsafeRows inside array and map, and 
we have to turn them into safe version as currently `toSeq` is not supported in 
UnsafeRow.

I'm working on changing `toSeq` to `toSeq(schema: StructType)` so that it can 
work on UnsafeRow. After that we can remove the unsafe->safe conversion and 
solve this issue.

> Unsafe ArrayData and MapData is very very slow
> ----------------------------------------------
>
>                 Key: SPARK-9604
>                 URL: https://issues.apache.org/jira/browse/SPARK-9604
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Davies Liu
>            Assignee: Wenchen Fan
>            Priority: Blocker
>
> After the unsafe ArrayData and MapData merged in, this test become very slow 
> (from less than 1 second to more than 35 seconds).
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/3157/testReport/org.apache.spark.sql.columnar/InMemoryColumnarQuerySuite/test_different_data_types/history/
> I tried to disable the cache, it's still very slow (also most the same), once 
> remove ArrayData and ArrayMap, it become much faster (still take about 10 
> seconds).
> Related changes: 
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/3148/changes
> Also the duration of Hive tests increased from 32min to 45min 
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/3154/testReport/junit/org.apache.spark.sql.hive.execution/history/
> cc [~rxin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to