peay created SPARK-21550:
----------------------------
Summary: approxQuantiles throws "next on empty iterator" on empty
data
Key: SPARK-21550
URL: https://issues.apache.org/jira/browse/SPARK-21550
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.1.0
Reporter: peay
The documentation says:
{code}
null and NaN values will be removed from the numerical column before
calculation. If
the dataframe is empty or the column only contains null or NaN, an empty array
is returned.
{code}
However, this small pyspark example
{code}
sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99],
0.001)
{code}
throws
{code}
Py4JJavaError: An error occurred while calling o3493.approxQuantile.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at
scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at
scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
at
scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
at
scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
at
scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186)
at
scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132)
at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186)
at
org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]