Hi Wenchen

Yes. We also find this error is caused by Rand. However, this is classic
way to solve data skew in Hive.  Is there any equivalent way in Spark?

Thanks
Chang

On Thu, Jul 13, 2017 at 8:25 PM, Wenchen Fan <cloud0...@gmail.com> wrote:

> It’s not about case when, but about rand(). Non-deterministic expressions
> are not allowed in join condition.
>
> > On 13 Jul 2017, at 6:43 PM, wangshuang <cn_...@qq.com> wrote:
> >
> > I'm trying to execute hive sql on spark sql (Also on spark
> thriftserver), For
> > optimizing data skew, we use "case when" to handle null.
> > Simple sql as following:
> >
> >
> > SELECT a.col1
> > FROM tbl1 a
> > LEFT OUTER JOIN tbl2 b
> > ON
> > *     CASE
> >               WHEN a.col2 IS NULL
> >                       TNEN cast(rand(9)*1000 - 9999999999 as string)
> >               ELSE
> >                       a.col2 END *
> >       = b.col3;
> >
> >
> > But I get the error:
> >
> > == Physical Plan ==
> > *org.apache.spark.sql.AnalysisException: nondeterministic expressions
> are
> > only allowed in
> > Project, Filter, Aggregate or Window, found:*
> > (((CASE WHEN (a.`nav_tcdt` IS NULL) THEN CAST(((rand(9) * CAST(1000 AS
> > DOUBLE)) - CAST(9999999999L AS DOUBLE)) AS STRING) ELSE a.`nav_tcdt` END
> =
> > c.`site_categ_id`) AND (CAST(a.`nav_tcd` AS INT) = 9)) AND (c.`cur_flag`
> =
> > 1))
> > in operator Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN
> > cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as
> > string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as
> int)
> > = 9)) && (cur_flag#77 = 1))
> >               ;;
> > GlobalLimit 10
> > +- LocalLimit 10
> >   +- Aggregate [date_id#7, CASE WHEN (cast(city_id#10 as string) IN
> > (cast(19596 as string),cast(20134 as string),cast(10997 as string)) &&
> > nav_tcdt#25 RLIKE ^[0-9]+$) THEN city_id#10 ELSE nav_tpa_id#21 END],
> > [date_id#7]
> >      +- Filter (date_id#7 = 2017-07-12)
> >         +- Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN
> > cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as
> > string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as
> int)
> > = 9)) && (cur_flag#77 = 1))
> >            :- SubqueryAlias a
> >            :  +- SubqueryAlias tmp_lifan_trfc_tpa_hive
> >            :     +- CatalogRelation `tmp`.`tmp_lifan_trfc_tpa_hive`,
> > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [date_id#7,
> chanl_id#8L,
> > pltfm_id#9, city_id#10, sessn_id#11, gu_id#12, nav_refer_page_type_id#13,
> > nav_refer_page_value#14, nav_refer_tpa#15, nav_refer_tpa_id#16,
> > nav_refer_tpc#17, nav_refer_tpi#18, nav_page_type_id#19,
> nav_page_value#20,
> > nav_tpa_id#21, nav_tpa#22, nav_tpc#23, nav_tpi#24, nav_tcdt#25,
> nav_tcd#26,
> > nav_tci#27, nav_tce#28, detl_refer_page_type_id#29,
> > detl_refer_page_value#30, ... 33 more fields]
> >            +- SubqueryAlias c
> >               +- SubqueryAlias dim_site_categ_ext
> >                  +- CatalogRelation `dw`.`dim_site_categ_ext`,
> > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> [site_categ_skid#64L,
> > site_categ_type#65, site_categ_code#66, site_categ_name#67,
> > site_categ_parnt_skid#68L, site_categ_kywrd#69, leaf_flg#70L,
> sort_seq#71L,
> > site_categ_srch_name#72, vsbl_flg#73, delet_flag#74, etl_batch_id#75L,
> > updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L,
> > site_categ_id#80, site_categ_parnt_id#81]
> >
> > Does spark sql not support syntax "case when" in JOIN?  Additional, my
> spark
> > version is 2.2.0.
> > Any help would be greatly appreciated.
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-developers
> -list.1001551.n3.nabble.com/SQL-Syntax-case-when-doesn-t-
> be-supported-in-JOIN-tp21953.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to