Chris Suchanek created SPARK-30532:
--------------------------------------
Summary: DataFrameStatFunctions.approxQuantile doesn't work with
TABLE.COLUMN syntax
Key: SPARK-30532
URL: https://issues.apache.org/jira/browse/SPARK-30532
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.4
Reporter: Chris Suchanek
The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified
column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to
the column when working with joined dataframes having ambiguous column names.
See code below for example.
{code:java}
import scala.util.Random
val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000)
val df1 = sc.parallelize(l).toDF("num").as("tt1")
val df2 = sc.parallelize(l).toDF("num").as("tt2")
val dfx = df2.crossJoin(df1)
dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0)
// throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist.
Available fields: num
dfx.stat.approxQuantile("num", Array(0.1), 0.0)
// throws: org.apache.spark.sql.AnalysisException: Reference 'num' is
ambiguous, could be: tt2.num, tt1.num.;{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]