Christopher Bryant created SPARK-35480:
------------------------------------------

             Summary: percentile_approx function doesn't work with pivot
                 Key: SPARK-35480
                 URL: https://issues.apache.org/jira/browse/SPARK-35480
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 3.1.1
            Reporter: Christopher Bryant


The percentile_approx PySpark function does not appear to treat the "accuracy" 
parameter correctly when pivoting on a column, causing the query below to fail 
(this also fails if the accuracy parameter is left unspecified):
----
{{{{import pyspark.sql.functions as F}}}}

{{df = sc.parallelize([}}
{{    ["a", -1.0],}}
{{    ["a", 5.5],}}
{{    ["a", 2.5],}}
{{    ["b", 3.0],}}
{{    ["b", 5]}}
{{]).toDF(["type", "value"]) \}}
{{    .groupBy() \}}
{{    .pivot("type", ["a", "b"]) \}}
{{    .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}}
----
Error message: 

{{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' 
AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS 
STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 10000, 
CAST(NULL AS INT))))' due to data type mismatch: The accuracy or percentage 
provided must be a constant literal; 'Aggregate [percentile_approx(if 
((type#242 <=> cast(a as string))) value#243 else cast(null as double), if 
((type#242 <=> cast(a as string))) array(0.5) else cast(null as array<double>), 
if ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS 
a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else 
cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243], false}}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to