[
https://issues.apache.org/jira/browse/SPARK-25961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-25961:
---------------------------------
Description:
My SQL query uses two tables to join. One table join key has null value. I use
random value instead of {{null}};however, it has error. The error as follows:
{code}
Error in query: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found
scan spark source code is org.apache.spark.sql.catalyst.analysis.CheckAnalysis
check sql, because the number of random variables is uncertain, it is prohibited
{code}
{code}
case o if o.expressions.exists(!_.deterministic) &&
!o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
!o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
// The rule above is used to check Aggregate operator.
failAnalysis(
s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
""".stripMargin)|
{code}
Is it possible to add Join to this code? It's not yet tested.And whether there
will be other effects
{code}
case o if o.expressions.exists(!_.deterministic) &&
!o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
!o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&&
!o.isInstanceOf[Join]{color}+ =>
// The rule above is used to check Aggregate operator.
failAnalysis(
s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window or Join, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
""".stripMargin)|
{code}
This is my SQL query is as below:
{code}
SELECT T1.CUST_NO AS CUST_NO,
T3.CON_LAST_NAME AS CUST_NAME,
T3.CON_SEX_MF AS SEX_CODE,
T3.X_POSITION AS POST_LV_CODE
FROM tmp.ICT_CUST_RANGE_INFO T1
LEFT JOIN tmp.F_CUST_BASE_INFO_ALL T3 ON CASE
WHEN coalesce(T1.CUST_NO, '') =''
THEN concat('cust_no', RAND())
ELSE T1.CUST_NO
END = T3.BECIF
AND T3.DATE='20181105'
WHERE T1.DATE='20181105'
{code}
was:
my query sql use two table join,one table join key has null value,i use rand
value instead of null value,but has error,the error info as follows:
Error in query: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found
scan spark source code is org.apache.spark.sql.catalyst.analysis.CheckAnalysis
check sql, because the number of random variables is uncertain, it is prohibited
case o if o.expressions.exists(!_.deterministic) &&
!o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
!o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
// The rule above is used to check Aggregate operator.
failAnalysis(
s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
""".stripMargin)|
Is it possible to add Join to this code? It's not yet tested.And whether there
will be other effects
case o if o.expressions.exists(!_.deterministic) &&
!o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
!o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&&
!o.isInstanceOf[Join]{color}+ =>
// The rule above is used to check Aggregate operator.
failAnalysis(
s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window or Join, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
""".stripMargin)|
this is my sparksql:
SELECT
T1.CUST_NO AS CUST_NO ,
T3.CON_LAST_NAME AS CUST_NAME ,
T3.CON_SEX_MF AS SEX_CODE ,
T3.X_POSITION AS POST_LV_CODE
FROM tmp.ICT_CUST_RANGE_INFO T1
LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') =''
THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and
T3.DATE='20181105'
WHERE T1.DATE='20181105'
> Random numbers are not supported when handling data skew
> --------------------------------------------------------
>
> Key: SPARK-25961
> URL: https://issues.apache.org/jira/browse/SPARK-25961
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.1
> Environment: spark on yarn 2.3.1
> Reporter: zengxl
> Priority: Major
>
> My SQL query uses two tables to join. One table join key has null value. I
> use random value instead of {{null}};however, it has error. The error as
> follows:
> {code}
> Error in query: nondeterministic expressions are only allowed in
> Project, Filter, Aggregate or Window, found
> scan spark source code is
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis check sql, because the
> number of random variables is uncertain, it is prohibited
> {code}
> {code}
> case o if o.expressions.exists(!_.deterministic) &&
> !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
> !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
> // The rule above is used to check Aggregate operator.
> failAnalysis(
> s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
> """.stripMargin)|
> {code}
>
> Is it possible to add Join to this code? It's not yet tested.And whether
> there will be other effects
> {code}
> case o if o.expressions.exists(!_.deterministic) &&
> !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
> !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&&
> !o.isInstanceOf[Join]{color}+ =>
> // The rule above is used to check Aggregate operator.
> failAnalysis(
> s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window or Join, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
> """.stripMargin)|
> {code}
> This is my SQL query is as below:
> {code}
> SELECT T1.CUST_NO AS CUST_NO,
> T3.CON_LAST_NAME AS CUST_NAME,
> T3.CON_SEX_MF AS SEX_CODE,
> T3.X_POSITION AS POST_LV_CODE
> FROM tmp.ICT_CUST_RANGE_INFO T1
> LEFT JOIN tmp.F_CUST_BASE_INFO_ALL T3 ON CASE
> WHEN coalesce(T1.CUST_NO, '')
> ='' THEN concat('cust_no', RAND())
> ELSE T1.CUST_NO
> END = T3.BECIF
> AND T3.DATE='20181105'
> WHERE T1.DATE='20181105'
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]