[jira] [Resolved] (SPARK-17366) Temp tables cached in spark - Joins performance

Sean Owen (JIRA) Fri, 02 Sep 2016 01:11:40 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-17366.
-------------------------------
    Resolution: Invalid

Yes, please start on the user@ mailing list.

> Temp tables cached in spark - Joins performance
> -----------------------------------------------
>
>                 Key: SPARK-17366
>                 URL: https://issues.apache.org/jira/browse/SPARK-17366
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: SQL
>         Environment: Amazon S3
>            Reporter: Chris Sanjiv Xavier
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Hi ,
> I have a use case wherein we have SPARK running on an EC2 instance from 
> amazon . We are puling data from an S3 Bucket . We pull them into DF's and 
> then cache the tables . 
> We face a lot of performance issues when we try to Join the two tables which 
> have been cached. It runs really slowly. 
> Example of issue :-
> Table A in memory 1000MB 
> Table B in memory 1000MB
> Pulling data using SQL interface on Zeppelin UI notebook on Amazon.
> Select * from table A inner join table B on A.column 1 = B.column 1 where 
> B.column 2 = 'SPARK' ; 
> The above query returns results extremely slowly . 
> This is a spark cluster with 6 nodes holding close to 250 GB memory in total.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-17366) Temp tables cached in spark - Joins performance

Reply via email to