Vinay varma created SPARK-19315:
-----------------------------------
Summary: StructType should support nested lookup; throws
IllegalArgumentException
Key: SPARK-19315
URL: https://issues.apache.org/jira/browse/SPARK-19315
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.2, 1.6.1
Reporter: Vinay varma
Priority: Minor
Datasets supports class composition. .joinWith operation in dataset also
results in composed type. StructType throws IllegalArgumentException for a
nested lookup. Since many validations check the schema, we are limiting these
to use flattened datasets only (ex: org.apache.spark.ml.feature.StringIndexer)
Is there any reason for not supporting such operations?
>From an initial check, looks like adding support to such look ups will break
>the existing contract at:
org.apache.spark.sql.types.StructType
def fieldIndex(name: String): Int
Example code, with breaking code:
case class A(id: Int, name: String)
case class B(id: Int, location: String)
class TestCompositionStruct extends FunSuite {
val spark =
SparkSession.builder().appName("TestCompositionStruct").master("local[4]").getOrCreate()
import spark.implicits._
val adf = spark.createDataFrame(List(A(1, "X"), A(2, "Y"))).as[A]
val bdf = spark.createDataFrame(List(B(1, "X_loc"), B(2, "Y_loc"))).as[B]
test("supportNestedDataset") {
val jdf = adf.joinWith(bdf, adf("id") ===
bdf("id")).withColumnRenamed("_1", "a").withColumnRenamed("_2", "b").as[(A, B)]
assert(jdf.select("a.id").count() > 0)
intercept[IllegalArgumentException](jdf.schema("a.id"))
}
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]