[
https://issues.apache.org/jira/browse/SPARK-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-17366.
-------------------------------
Resolution: Invalid
Yes, please start on the user@ mailing list.
> Temp tables cached in spark - Joins performance
> -----------------------------------------------
>
> Key: SPARK-17366
> URL: https://issues.apache.org/jira/browse/SPARK-17366
> Project: Spark
> Issue Type: Brainstorming
> Components: SQL
> Environment: Amazon S3
> Reporter: Chris Sanjiv Xavier
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> Hi ,
> I have a use case wherein we have SPARK running on an EC2 instance from
> amazon . We are puling data from an S3 Bucket . We pull them into DF's and
> then cache the tables .
> We face a lot of performance issues when we try to Join the two tables which
> have been cached. It runs really slowly.
> Example of issue :-
> Table A in memory 1000MB
> Table B in memory 1000MB
> Pulling data using SQL interface on Zeppelin UI notebook on Amazon.
> Select * from table A inner join table B on A.column 1 = B.column 1 where
> B.column 2 = 'SPARK' ;
> The above query returns results extremely slowly .
> This is a spark cluster with 6 nodes holding close to 250 GB memory in total.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]