[ https://issues.apache.org/jira/browse/SPARK-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Bryński updated SPARK-11282: ----------------------------------- Description: Hi, I found very strange broadcast join behaviour. According to this Jira https://issues.apache.org/jira/browse/SPARK-10577 I'm using hint for broadcast join. (I patched 1.5.1 with https://github.com/apache/spark/pull/8801/files ) I found that working of this feature depends on Executor Memory. In my case broadcast join is working up to 31G. Example: {code} spark1:~/ab$ ~/spark/bin/spark-submit --executor-memory 31G debug_broadcast_join.py true Creating test tables... Joining tables... Joined table schema: root |-- id: long (nullable = true) |-- val: long (nullable = true) |-- id2: long (nullable = true) |-- val2: long (nullable = true) Selecting data for id = 5... [Row(id=5, val=5, id2=5, val2=5)] spark$ ~/spark/bin/spark-submit --executor-memory 32G debug_broadcast_join.py true Creating test tables... Joining tables... Joined table schema: root |-- id: long (nullable = true) |-- val: long (nullable = true) |-- id2: long (nullable = true) |-- val2: long (nullable = true) Selecting data for id = 5... [Row(id=5, val=5, id2=None, val2=None)] {code} Please find example code attached. was: Hi, I found very strange broadcast join behaviour. According to this Jira https://issues.apache.org/jira/browse/SPARK-10577 I'm using hint for broadcast join. (I patched 1.5.1 with https://github.com/apache/spark/pull/8801/files ) I found that working of this feature depends on Executor Memory. In my case broadcast join is working up to 31G. Example: spark1:~/ab$ ~/spark/bin/spark-submit --executor-memory 31G debug_broadcast_join.py true Creating test tables... Joining tables... Joined table schema: root |-- id: long (nullable = true) |-- val: long (nullable = true) |-- id2: long (nullable = true) |-- val2: long (nullable = true) Selecting data for id = 5... [Row(id=5, val=5, id2=5, val2=5)] spark$ ~/spark/bin/spark-submit --executor-memory 32G debug_broadcast_join.py true Creating test tables... Joining tables... Joined table schema: root |-- id: long (nullable = true) |-- val: long (nullable = true) |-- id2: long (nullable = true) |-- val2: long (nullable = true) Selecting data for id = 5... [Row(id=5, val=5, id2=None, val2=None)] Please find example code attached. > Very strange broadcast join behaviour > ------------------------------------- > > Key: SPARK-11282 > URL: https://issues.apache.org/jira/browse/SPARK-11282 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.5.1 > Reporter: Maciej Bryński > Priority: Critical > Attachments: SPARK-11282.py > > > Hi, > I found very strange broadcast join behaviour. > According to this Jira https://issues.apache.org/jira/browse/SPARK-10577 > I'm using hint for broadcast join. (I patched 1.5.1 with > https://github.com/apache/spark/pull/8801/files ) > I found that working of this feature depends on Executor Memory. > In my case broadcast join is working up to 31G. > Example: > {code} > spark1:~/ab$ ~/spark/bin/spark-submit --executor-memory 31G > debug_broadcast_join.py true > Creating test tables... > Joining tables... > Joined table schema: > root > |-- id: long (nullable = true) > |-- val: long (nullable = true) > |-- id2: long (nullable = true) > |-- val2: long (nullable = true) > Selecting data for id = 5... > [Row(id=5, val=5, id2=5, val2=5)] > spark$ ~/spark/bin/spark-submit --executor-memory 32G debug_broadcast_join.py > true > Creating test tables... > Joining tables... > Joined table schema: > root > |-- id: long (nullable = true) > |-- val: long (nullable = true) > |-- id2: long (nullable = true) > |-- val2: long (nullable = true) > Selecting data for id = 5... > [Row(id=5, val=5, id2=None, val2=None)] > {code} > Please find example code attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org