[
https://issues.apache.org/jira/browse/SPARK-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239637#comment-15239637
]
Cheng Lian commented on SPARK-13753:
------------------------------------
[~jingweilu] Could you please provide the schema of tables involved in the SQL
query you provided so that we can reproduce this issue more easily? Also, it
would be greatly helpful if you can help to derive a minimized query that
reproduces this issue. Thanks!
> Column nullable is derived incorrectly
> --------------------------------------
>
> Key: SPARK-13753
> URL: https://issues.apache.org/jira/browse/SPARK-13753
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Jingwei Lu
> Priority: Critical
>
> There is a problem in spark sql to derive nullable column and used in
> optimization incorrectly. In following query:
> {code}
> select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0]
> from (
> select explode(map(a.frontend[0],
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"),
> ",action:", COALESCE(action, "null")), ".p50"),
> a.frontend[1],
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"),
> ",action:", COALESCE(action, "null")), ".p90"),
> a.backend[0], ARRAY(concat("metric:backend",
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action,
> "null")), ".p50"),
> a.backend[1], ARRAY(concat("metric:backend",
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action,
> "null")), ".p90"),
> a.render[0], ARRAY(concat("metric:render",
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action,
> "null")), ".p50"),
> a.render[1], ARRAY(concat("metric:render",
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action,
> "null")), ".p90"),
> a.page_load_time[0],
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller,
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
> a.page_load_time[1],
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller,
> "null"), ",action:", COALESCE(action, "null")), ".p90"),
> a.total_load_time[0],
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller,
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
> a.total_load_time[1],
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller,
> "null"), ",action:", COALESCE(action, "null")), ".p90"))) as (value, tags)
> from (
> select data.controller as controller, data.action as
> action,
> percentile(data.frontend, array(0.5, 0.9)) as
> frontend,
> percentile(data.backend, array(0.5, 0.9)) as
> backend,
> percentile(data.render, array(0.5, 0.9)) as render,
> percentile(data.page_load_time, array(0.5, 0.9)) as
> page_load_time,
> percentile(data.total_load_time, array(0.5, 0.9))
> as total_load_time
> from air_events_rt
> where type='air_events' and data.event_name='pageload'
> group by data.controller, data.action
> ) a
> ) b
> where b.value is not null
> {code}
> b.value is incorrectly derived as not nullable. "b.value is not null"
> predicate will be ignored by optimizer which cause the query return incorrect
> result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]