[ https://issues.apache.org/jira/browse/SPARK-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-13753: ------------------------------------- Target Version/s: 2.0.0 Priority: Critical (was: Major) > Column nullable is derived incorrectly > -------------------------------------- > > Key: SPARK-13753 > URL: https://issues.apache.org/jira/browse/SPARK-13753 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2 > Reporter: Jingwei Lu > Priority: Critical > > There is a problem in spark sql to derive nullable column and used in > optimization incorrectly. In following query: > {code} > select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0] > from ( > select explode(map(a.frontend[0], > ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), > ",action:", COALESCE(action, "null")), ".p50"), > a.frontend[1], > ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), > ",action:", COALESCE(action, "null")), ".p90"), > a.backend[0], ARRAY(concat("metric:backend", > ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, > "null")), ".p50"), > a.backend[1], ARRAY(concat("metric:backend", > ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, > "null")), ".p90"), > a.render[0], ARRAY(concat("metric:render", > ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, > "null")), ".p50"), > a.render[1], ARRAY(concat("metric:render", > ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, > "null")), ".p90"), > a.page_load_time[0], > ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, > "null"), ",action:", COALESCE(action, "null")), ".p50"), > a.page_load_time[1], > ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, > "null"), ",action:", COALESCE(action, "null")), ".p90"), > a.total_load_time[0], > ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, > "null"), ",action:", COALESCE(action, "null")), ".p50"), > a.total_load_time[1], > ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, > "null"), ",action:", COALESCE(action, "null")), ".p90"))) as (value, tags) > from ( > select data.controller as controller, data.action as > action, > percentile(data.frontend, array(0.5, 0.9)) as > frontend, > percentile(data.backend, array(0.5, 0.9)) as > backend, > percentile(data.render, array(0.5, 0.9)) as render, > percentile(data.page_load_time, array(0.5, 0.9)) as > page_load_time, > percentile(data.total_load_time, array(0.5, 0.9)) > as total_load_time > from air_events_rt > where type='air_events' and data.event_name='pageload' > group by data.controller, data.action > ) a > ) b > where b.value is not null > {code} > b.value is incorrectly derived as not nullable. "b.value is not null" > predicate will be ignored by optimizer which cause the query return incorrect > result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org