[ https://issues.apache.org/jira/browse/SPARK-47034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817123#comment-17817123 ]
Bruce Robbins commented on SPARK-47034: --------------------------------------- I wonder if this is SPARK-45592 (and, relatedly, SPARK-45282), which existed as a bug in 3.5.0 but is fixed on master and branch-3.5. > join between cached temp tables result in missing entries > --------------------------------------------------------- > > Key: SPARK-47034 > URL: https://issues.apache.org/jira/browse/SPARK-47034 > Project: Spark > Issue Type: Bug > Components: Examples > Affects Versions: 3.5.0 > Reporter: shurik mermelshtein > Priority: Major > > we create several temp tables (views) by loading several delta tables and > joining between them. > those views are used for calculation of different metrics. each metric > requires different views to be used. some of the more popular views are > cached for better performance. > we have noticed that once we upgraded from spark 3.4.2 to spark 3.5.0 some > of the join started to fail. > we can reproduce a case were we have 2 data frames (views) (this is not the > real names / values we use. this is just for the example) > # users with the column user_id, campaign_id, user_name. > we make sure it has a single entry > '111111', '22222', 'Jhon Doe' > # actions with the column user_id, campaign_id, action_id, action count > we make sure it has a single entry > '111111', '22222', 'clicks', 5 > > # users view can be filtered for user_id = '111111' or/and campaign_id = > '22222' and it will find the existing single row > # actions view can be filtered for user_id = '111111' or/and campaign_id = > '22222' and it will find the existing single row > # users and actions can be inner join by user_id *OR* campaign_id and the > join will be successful. > # users and actions can *not* be inner join by user_id *AND* campaign_id. > The join results in no entry. > # if we write both of the views to S3 and read them back to new data frames, > suddenly the join is working. > # if we disable AQE the join is working > # running checkpoint on the views does not make join #4 work -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org