[ 
https://issues.apache.org/jira/browse/SPARK-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336567#comment-15336567
 ] 

Davies Liu commented on SPARK-13753:
------------------------------------

After discussed with [~cloud_fan], we do have runtime check to make sure that 
they key of Map could not be null, but do not have check on schema. So the 
query could fail, but should not return incorrect results.

Could you provide more on the expected result and actual results?

> Column nullable is derived incorrectly
> --------------------------------------
>
>                 Key: SPARK-13753
>                 URL: https://issues.apache.org/jira/browse/SPARK-13753
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Jingwei Lu
>            Priority: Critical
>
> There is a problem in spark sql to derive nullable column and used in 
> optimization incorrectly. In following query:
> {code}
> select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0]
>               from (
>                 select explode(map(a.frontend[0], 
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), 
> ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.frontend[1], 
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), 
> ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.backend[0], ARRAY(concat("metric:backend", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p50"),
>                                  a.backend[1], ARRAY(concat("metric:backend", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p90"),
>                                  a.render[0], ARRAY(concat("metric:render", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p50"),
>                                  a.render[1], ARRAY(concat("metric:render", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p90"),
>                                  a.page_load_time[0], 
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.page_load_time[1], 
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.total_load_time[0], 
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.total_load_time[1], 
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p90"))) as (value, tags)
>                 from (
>                   select  data.controller as controller, data.action as 
> action,
>                           percentile(data.frontend, array(0.5, 0.9)) as 
> frontend,
>                           percentile(data.backend, array(0.5, 0.9)) as 
> backend,
>                           percentile(data.render, array(0.5, 0.9)) as render,
>                           percentile(data.page_load_time, array(0.5, 0.9)) as 
> page_load_time,
>                           percentile(data.total_load_time, array(0.5, 0.9)) 
> as total_load_time
>                   from air_events_rt
>                   where type='air_events' and data.event_name='pageload'
>                   group by data.controller, data.action
>                 ) a
>               ) b
>               where b.value is not null
> {code}
> b.value is incorrectly derived as not nullable.  "b.value is not null" 
> predicate will be ignored by optimizer which cause the query return incorrect 
> result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to