ShiHang Gao created SPARK-38038:
-----------------------------------
Summary: DataSourceV2 ORCTable can't read partition doesn't
contains "="
Key: SPARK-38038
URL: https://issues.apache.org/jira/browse/SPARK-38038
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.3
Reporter: ShiHang Gao
While testing SPARK-27919
(#[24768|https://github.com/apache/spark/pull/24768]), I tried to use the v2
ORC implementation to validate a v2 catalog that delegates to the session
catalog. The ORC implementation fails the following test case because it cannot
infer a schema (there is no data) but it should be using the schema used to
create the table.
Test case:
{code}
test("CreateTable: test ORC source") {
spark.conf.set("spark.sql.catalog.session", classOf[V2SessionCatalog].getName)
spark.sql(s"CREATE TABLE table_name (id bigint, data string) USING $orc2")
val testCatalog = spark.catalog("session").asTableCatalog
val table = testCatalog.loadTable(Identifier.of(Array(), "table_name"))
assert(table.name == "orc ") // <-- should this be table_name?
assert(table.partitioning.isEmpty)
assert(table.properties == Map(
"provider" -> orc2,
"database" -> "default",
"table" -> "table_name").asJava)
assert(table.schema == new StructType().add("id", LongType).add("data",
StringType)) // <-- fail
val rdd =
spark.sparkContext.parallelize(table.asInstanceOf[InMemoryTable].rows)
checkAnswer(spark.internalCreateDataFrame(rdd, table.schema), Seq.empty)
}
{code}
Error:
{code}
Unable to infer schema for ORC. It must be specified manually.;
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must
be specified manually.;
at
org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$5(FileTable.scala:61)
at scala.Option.getOrElse(Option.scala:138)
at
org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:61)
at
org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:54)
at
org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:67)
at
org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:65)
at
org.apache.spark.sql.sources.v2.DataSourceV2SQLSuite.$anonfun$new$5(DataSourceV2SQLSuite.scala:82)
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]