Re: Spark SQL API taking longer time than DF API.

2019-04-17 Thread Yeikel
Please share the results of df.explain()[1] for both. That should give us
some clues of what the differences are

[1]https://github.com/apache/spark/blob/e1c90d66bbea5b4cb97226610701b0389b734651/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L499



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark SQL API taking longer time than DF API.

2019-04-08 Thread chris
Hi,

Without more information it’s very difficult to work out what’s going on. If 
possible can you do the following and make available to us.

1) for each query call explain() and post the output.

2) Run each query and then go to the sql tab in the spark ui. For each query 
show us the plan.

3) For each query post some screenshots of the tasks page in the spark ui.

(In all of the above make sure to redact ay sensitive information!)

You are right in thinking that that the queries should be identical. My hunch 
is that something subtle is making them not quite identical and the above 
information should allow us to figure out what.

Thanks,

Chris 



> On 8 Apr 2019, at 09:21, neeraj bhadani  wrote:
> 
> Hi All,
> Can anyone help me here with my query?
> 
> Regards,
> Neeraj
> 
>> On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani  
>> wrote:
>> In Both the cases, I am trying to create a HIVE table based on Union on 2 
>> same queries.
>> 
>> Not sure how internally it differs on the process of creation of HIVE table?
>> 
>> Regards,
>> Neeraj
>> 
>>> On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke  wrote:
>>> Is the select taking longer or the saving to a file. You seem to only save 
>>> in the second case to a file 
>>> 
 Am 29.03.2019 um 15:10 schrieb neeraj bhadani 
 :
 
 Hi Team,
I am executing same spark code using the Spark SQL API and DataFrame 
 API, however, Spark SQL is taking longer than expected.
 
 PFB Sudo code.
 ---
 Case 1 : Spark SQL
 ---
 %sql
 CREATE TABLE 
 AS
 
  WITH  AS (
  
 )
 , AS (
  
  )
 
 SELECT * FROM  
 UNION ALL
 SELECT * FROM 
 
 ---
 Case  2 : DataFrame API
 ---
 
 df1 = spark.sql()
 df2 = spark.sql()
 df3 = df1.union(df2)
 df3.write.saveAsTable()
 ---
 
 As per my understanding, both Spark SQL and DtaaFrame API generate the 
 same code under the hood and execution time has to be similar.
 
 Regards,
 Neeraj
 


Re: Spark SQL API taking longer time than DF API.

2019-04-08 Thread neeraj bhadani
Hi All,
Can anyone help me here with my query?

Regards,
Neeraj

On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani 
wrote:

> In Both the cases, I am trying to create a HIVE table based on Union on 2
> same queries.
>
> Not sure how internally it differs on the process of creation of HIVE
> table?
>
> Regards,
> Neeraj
>
> On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke  wrote:
>
>> Is the select taking longer or the saving to a file. You seem to only
>> save in the second case to a file
>>
>> Am 29.03.2019 um 15:10 schrieb neeraj bhadani <
>> bhadani.neeraj...@gmail.com>:
>>
>> Hi Team,
>>I am executing same spark code using the Spark SQL API and DataFrame
>> API, however, Spark SQL is taking longer than expected.
>>
>> PFB Sudo code.
>>
>> ---
>>
>> Case 1 : Spark SQL
>>
>>
>> ---
>>
>> %sql
>>
>> CREATE TABLE 
>>
>> AS
>>
>>
>>  WITH  AS (
>>
>>  
>>
>> )
>>
>> , AS (
>>
>>  
>>
>>  )
>>
>>
>> SELECT * FROM 
>>
>> UNION ALL
>>
>> SELECT * FROM 
>>
>>
>>
>> ---
>>
>> Case  2 : DataFrame API
>>
>>
>> ---
>>
>>
>> df1 = spark.sql()
>>
>> df2 = spark.sql()
>>
>> df3 = df1.union(df2)
>>
>> df3.write.saveAsTable()
>>
>>
>> ---
>>
>>
>> As per my understanding, both Spark SQL and DtaaFrame API generate the
>> same code under the hood and execution time has to be similar.
>>
>>
>> Regards,
>>
>> Neeraj
>>
>>
>>


Re: Spark SQL API taking longer time than DF API.

2019-04-01 Thread neeraj bhadani
In Both the cases, I am trying to create a HIVE table based on Union on 2
same queries.

Not sure how internally it differs on the process of creation of HIVE table?

Regards,
Neeraj

On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke  wrote:

> Is the select taking longer or the saving to a file. You seem to only save
> in the second case to a file
>
> Am 29.03.2019 um 15:10 schrieb neeraj bhadani  >:
>
> Hi Team,
>I am executing same spark code using the Spark SQL API and DataFrame
> API, however, Spark SQL is taking longer than expected.
>
> PFB Sudo code.
>
> ---
>
> Case 1 : Spark SQL
>
>
> ---
>
> %sql
>
> CREATE TABLE 
>
> AS
>
>
>  WITH  AS (
>
>  
>
> )
>
> , AS (
>
>  
>
>  )
>
>
> SELECT * FROM 
>
> UNION ALL
>
> SELECT * FROM 
>
>
>
> ---
>
> Case  2 : DataFrame API
>
>
> ---
>
>
> df1 = spark.sql()
>
> df2 = spark.sql()
>
> df3 = df1.union(df2)
>
> df3.write.saveAsTable()
>
>
> ---
>
>
> As per my understanding, both Spark SQL and DtaaFrame API generate the
> same code under the hood and execution time has to be similar.
>
>
> Regards,
>
> Neeraj
>
>
>


Re: Spark SQL API taking longer time than DF API.

2019-03-31 Thread Jörn Franke
Is the select taking longer or the saving to a file. You seem to only save in 
the second case to a file 

> Am 29.03.2019 um 15:10 schrieb neeraj bhadani :
> 
> Hi Team,
>I am executing same spark code using the Spark SQL API and DataFrame API, 
> however, Spark SQL is taking longer than expected.
> 
> PFB Sudo code.
> ---
> Case 1 : Spark SQL
> ---
> %sql
> CREATE TABLE 
> AS
> 
>  WITH  AS (
>  
> )
> , AS (
>  
>  )
> 
> SELECT * FROM  
> UNION ALL
> SELECT * FROM 
> 
> ---
> Case  2 : DataFrame API
> ---
> 
> df1 = spark.sql()
> df2 = spark.sql()
> df3 = df1.union(df2)
> df3.write.saveAsTable()
> ---
> 
> As per my understanding, both Spark SQL and DtaaFrame API generate the same 
> code under the hood and execution time has to be similar.
> 
> Regards,
> Neeraj
> 


Re: Spark SQL API taking longer time than DF API.

2019-03-31 Thread neeraj bhadani
qry_1 and qry_2 are simple select query with groupBy clause.

Are there any specific queries which works in a different way for Spark SQL
and DataFrame API?

Regards,
Neeraj

On Sat, Mar 30, 2019 at 7:27 PM Jason Nerothin 
wrote:

> Can you please quantify the difference and provide the query code?
>
> On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani <
> bhadani.neeraj...@gmail.com> wrote:
>
>> Hi Team,
>>I am executing same spark code using the Spark SQL API and DataFrame
>> API, however, Spark SQL is taking longer than expected.
>>
>> PFB Sudo code.
>>
>> ---
>>
>> Case 1 : Spark SQL
>>
>>
>> ---
>>
>> %sql
>>
>> CREATE TABLE 
>>
>> AS
>>
>>
>>  WITH  AS (
>>
>>  
>>
>> )
>>
>> , AS (
>>
>>  
>>
>>  )
>>
>>
>> SELECT * FROM 
>>
>> UNION ALL
>>
>> SELECT * FROM 
>>
>>
>>
>> ---
>>
>> Case  2 : DataFrame API
>>
>>
>> ---
>>
>>
>> df1 = spark.sql()
>>
>> df2 = spark.sql()
>>
>> df3 = df1.union(df2)
>>
>> df3.write.saveAsTable()
>>
>>
>> ---
>>
>>
>> As per my understanding, both Spark SQL and DtaaFrame API generate the
>> same code under the hood and execution time has to be similar.
>>
>>
>> Regards,
>>
>> Neeraj
>>
>>
>>
>
> --
> Thanks,
> Jason
>


Re: Spark SQL API taking longer time than DF API.

2019-03-30 Thread Jason Nerothin
Can you please quantify the difference and provide the query code?

On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani 
wrote:

> Hi Team,
>I am executing same spark code using the Spark SQL API and DataFrame
> API, however, Spark SQL is taking longer than expected.
>
> PFB Sudo code.
>
> ---
>
> Case 1 : Spark SQL
>
>
> ---
>
> %sql
>
> CREATE TABLE 
>
> AS
>
>
>  WITH  AS (
>
>  
>
> )
>
> , AS (
>
>  
>
>  )
>
>
> SELECT * FROM 
>
> UNION ALL
>
> SELECT * FROM 
>
>
>
> ---
>
> Case  2 : DataFrame API
>
>
> ---
>
>
> df1 = spark.sql()
>
> df2 = spark.sql()
>
> df3 = df1.union(df2)
>
> df3.write.saveAsTable()
>
>
> ---
>
>
> As per my understanding, both Spark SQL and DtaaFrame API generate the
> same code under the hood and execution time has to be similar.
>
>
> Regards,
>
> Neeraj
>
>
>

-- 
Thanks,
Jason


Spark SQL API taking longer time than DF API.

2019-03-29 Thread neeraj bhadani
Hi Team,
   I am executing same spark code using the Spark SQL API and DataFrame
API, however, Spark SQL is taking longer than expected.

PFB Sudo code.
---

Case 1 : Spark SQL

---

%sql

CREATE TABLE 

AS


 WITH  AS (

 

)

, AS (

 

 )


SELECT * FROM 

UNION ALL

SELECT * FROM 


---

Case  2 : DataFrame API

---


df1 = spark.sql()

df2 = spark.sql()

df3 = df1.union(df2)

df3.write.saveAsTable()

---


As per my understanding, both Spark SQL and DtaaFrame API generate the same
code under the hood and execution time has to be similar.


Regards,

Neeraj