[ 
https://issues.apache.org/jira/browse/SPARK-25961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679237#comment-16679237
 ] 

Dongjoon Hyun commented on SPARK-25961:
---------------------------------------

[~zengxl]. Please use English in Apache Spark JIRA.

> 处理数据倾斜时使用随机数不支持
> ---------------
>
>                 Key: SPARK-25961
>                 URL: https://issues.apache.org/jira/browse/SPARK-25961
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>         Environment: spark on yarn 2.3.1
>            Reporter: zengxl
>            Priority: Major
>
> 两个表连接,有一个表存在空值,给join键加上随机数,提示不可以
> Error in query: nondeterministic expressions are only allowed in
> Project, Filter, Aggregate or Window, found
> 查看源码发现是在org.apache.spark.sql.catalyst.analysis.CheckAnalysis进行sql校验,由于随机数是不确定值被禁止了
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
>  |Project, Filter, Aggregate or Window, found:
>  | ${o.expressions.map(_.sql).mkString(",")}
>  |in operator ${operator.simpleString}
>  """.stripMargin)
> 是否在这段代码加上Join情况就可以?现在还没测试
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& 
> !o.isInstanceOf[Join]{color}+ =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
>  |Project, Filter, Aggregate or Window or Join, found:
>  | ${o.expressions.map(_.sql).mkString(",")}
>  |in operator ${operator.simpleString}
>  """.stripMargin)
>  
> 我的sql:
> SELECT
> T1.CUST_NO AS CUST_NO ,
> T3.CON_LAST_NAME AS CUST_NAME ,
> T3.CON_SEX_MF AS SEX_CODE ,
> T3.X_POSITION AS POST_LV_CODE 
> FROM tmp.ICT_CUST_RANGE_INFO T1
> LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') 
> ='' THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and 
> T3.DATE='20181105'
> WHERE T1.DATE='20181105'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to