[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZygD updated SPARK-38614: ------------------------- Summary: df.show() shows incorrect F.percent_rank results (was: df.show(3) does not equal df.show() first rows) > df.show() shows incorrect F.percent_rank results > ------------------------------------------------ > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.1 > Reporter: ZygD > Priority: Major > > Expected result is obtained using Spark 3.0.2, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---+----+ > only showing top 3 rows > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---+----+ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+------------------+ > | id| pr| > +---+------------------+ > | 0| 0.0| > | 1|0.3333333333333333| > | 2|0.6666666666666666| > +---+------------------+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org