[
https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Bryant updated SPARK-35480:
---------------------------------------
Description:
The percentile_approx PySpark function does not appear to treat the "accuracy"
parameter correctly when pivoting on a column, causing the query below to fail
(this also fails if the accuracy parameter is left unspecified):
----
{{import pyspark.sql.functions as F}}
{{df = sc.parallelize([}}
{{ ["a", -1.0],}}
{{ ["a", 5.5],}}
{{ ["a", 2.5],}}
{{ ["b", 3.0],}}
{{ ["b", 5]}}
{{]).toDF(["type", "value"])}}
{{ .groupBy()}}
{{ .pivot("type", ["a", "b"])}}
{{ .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}}
----
Error message:
{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a'
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 10000,
CAST(NULL AS INT))))' due to data type mismatch: The accuracy or percentage
provided must be a constant literal; 'Aggregate [percentile_approx(if
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array<double>),
if ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else
cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else
cast(null as int), 0, 0) AS b#253|#242 <=> cast(a as string))) value#243 else
cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else
cast(null as array<double>), if ((type#242 <=> cast(a as string))) 10000 else
cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b
as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as
string))) array(0.5) else cast(null as array<double>), if ((type#242 <=> cast(b
as string))) 10000 else cast(null as int), 0, 0) AS b#253] +- LogicalRDD
[type#242, value#243|#242, value#243], false}}
was:
The percentile_approx PySpark function does not appear to treat the "accuracy"
parameter correctly when pivoting on a column, causing the query below to fail
(this also fails if the accuracy parameter is left unspecified):
----
{{{{import pyspark.sql.functions as F}}}}
{{df = sc.parallelize([}}
{{ ["a", -1.0],}}
{{ ["a", 5.5],}}
{{ ["a", 2.5],}}
{{ ["b", 3.0],}}
{{ ["b", 5]}}
{{]).toDF(["type", "value"]) \}}
{{ .groupBy() \}}
{{ .pivot("type", ["a", "b"]) \}}
{{ .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}}
----
Error message:
{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a'
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 10000,
CAST(NULL AS INT))))' due to data type mismatch: The accuracy or percentage
provided must be a constant literal; 'Aggregate [percentile_approx(if
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array<double>),
if ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else
cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else
cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}}
> percentile_approx function doesn't work with pivot
> --------------------------------------------------
>
> Key: SPARK-35480
> URL: https://issues.apache.org/jira/browse/SPARK-35480
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 3.1.1
> Reporter: Christopher Bryant
> Priority: Major
>
> The percentile_approx PySpark function does not appear to treat the
> "accuracy" parameter correctly when pivoting on a column, causing the query
> below to fail (this also fails if the accuracy parameter is left unspecified):
> ----
> {{import pyspark.sql.functions as F}}
> {{df = sc.parallelize([}}
> {{ ["a", -1.0],}}
> {{ ["a", 5.5],}}
> {{ ["a", 2.5],}}
> {{ ["b", 3.0],}}
> {{ ["b", 5]}}
> {{]).toDF(["type", "value"])}}
> {{ .groupBy()}}
> {{ .pivot("type", ["a", "b"])}}
> {{ .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}}
> ----
> Error message:
> {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=>
> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=>
> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS
> STRING)), 10000, CAST(NULL AS INT))))' due to data type mismatch: The
> accuracy or percentage provided must be a constant literal; 'Aggregate
> [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else
> cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else
> cast(null as array<double>), if ((type#242 <=> cast(a as string))) 10000 else
> cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b
> as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as
> string))) array(0.5) else cast(null as array<double>), if ((type#242 <=>
> cast(b as string))) 10000 else cast(null as int), 0, 0) AS b#253|#242 <=>
> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=>
> cast(a as string))) array(0.5) else cast(null as array<double>), if
> ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS
> a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else
> cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else
> cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else
> cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242,
> value#243], false}}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]