[ https://issues.apache.org/jira/browse/SPARK-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-22957: ----------------------------------- Assignee: Juliusz Sompolski > ApproxQuantile breaks if the number of rows exceeds MaxInt > ---------------------------------------------------------- > > Key: SPARK-22957 > URL: https://issues.apache.org/jira/browse/SPARK-22957 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Reporter: Juliusz Sompolski > Assignee: Juliusz Sompolski > Fix For: 2.3.0 > > > ApproxQuantile overflows when number of rows exceeds 2.147B (max int32). > If you run ApproxQuantile on a dataframe with 3B rows of 1 to 3B and ask it > for 1/6 quantiles, it should return [0.5B, 1B, 1.5B, 2B, 2.5B, 3B]. However, > in the [implementation of > ApproxQuantile|https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala#L195], > it calls .toInt on the target rank, which overflows at 2.147B. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org