[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

Alexey Zinoviev (Jira) Thu, 26 Sep 2019 10:28:22 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938765#comment-16938765
 ]


Alexey Zinoviev edited comment on IGNITE-12054 at 9/26/19 5:27 PM:
-------------------------------------------------------------------

Also, in Spark was fixed bug with incorrect null handling on columns in codition

https://issues.apache.org/jira/browse/SPARK-21479

It leads to IgniteOptimizationJoinSpec fixes (the same thing was in the 
previous migration from 2.2 to 2.3)

 

 

I made experiment with Spark code for version 2.3 it generates the next plan

== Parsed Logical Plan ==
'Project ['jt1.id AS id1#28, 'jt1.val1, 'jt2.id AS id2#29, 'jt2.val2]
+- 'Join LeftOuter, ('jt1.val1 = 'jt2.val2)
 :- 'UnresolvedRelation `jt1`
 +- 'UnresolvedRelation `jt2`

== Analyzed Logical Plan ==
id1: string, val1: string, id2: string, val2: string
Project [id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25]
+- Join LeftOuter, (val1#11 = val2#25)
 :- SubqueryAlias jt1
 : +- Relation[id#10,val1#11] csv
 +- SubqueryAlias jt2
 +- Relation[id#24,val2#25] csv

== Optimized Logical Plan ==
Project [id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25]
+- Join LeftOuter, (val1#11 = val2#25)
 :- Relation[id#10,val1#11] csv
 +- Relation[id#24,val2#25] csv

 

The 2.4 generates

== Parsed Logical Plan ==
'Project ['jt1.id AS id1#28, 'jt1.val1, 'jt2.id AS id2#29, 'jt2.val2]
+- 'Join LeftOuter, ('jt1.val1 = 'jt2.val2)
 :- 'UnresolvedRelation `jt1`
 +- 'UnresolvedRelation `jt2`

== Analyzed Logical Plan ==
id1: string, val1: string, id2: string, val2: string
Project [id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25]
+- Join LeftOuter, (val1#11 = val2#25)
 :- SubqueryAlias `jt1`
 : +- Relation[id#10,val1#11] csv
 +- SubqueryAlias `jt2`
 +- Relation[id#24,val2#25] csv

== Optimized Logical Plan ==
Project [id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25]
+- Join LeftOuter, (val1#11 = val2#25)
 :- Relation[id#10,val1#11] csv
 +- Filter isnotnull(val2#25)
 +- Relation[id#24,val2#25] csv

 

The +- Filter isnotnull(val2#25) is added in optimized logical plan

But in reality it doesn't work properly (and doesn't filter in Spark), but wow! 
It works for Ignite (because we honestly work with Spark plan)

 

If you enable next option 

.config("ignite.disableSparkSQLOptimization", "true") - the behaviour will be 
the same in Ignite and Spark and will not add the filter

 

The best approach for Spark 2.4 - disableSparkOptimization before fixing on 
Spark side (I could create a bug for this)


was (Author: zaleslaw):
Also, in Spark was fixed bug with incorrect null handling on columns in codition

https://issues.apache.org/jira/browse/SPARK-21479

It leads to IgniteOptimizationJoinSpec fixes (the same thing was in the 
previous migration from 2.2 to 2.3)

> Upgrade Spark module to 2.4
> ---------------------------
>
>                 Key: IGNITE-12054
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12054
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 2.7.5
>            Reporter: Denis A. Magda
>            Assignee: Alexey Zinoviev
>            Priority: Blocker
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

Reply via email to