Repository: spark
Updated Branches:
refs/heads/master 70112ff22 -> eb8bfa3ea
[SPARK-9618] [SQL] Use the specified schema when reading Parquet files
The user specified schema is currently ignored when loading Parquet files.
One workaround is to use the `format` and `load` methods instead of `parquet`,
e.g.:
```
val schema = ???
// schema is ignored
sqlContext.read.schema(schema).parquet("hdfs:///test")
// schema is retained
sqlContext.read.schema(schema).format("parquet").load("hdfs:///test")
```
The fix is simple, but I wonder if the `parquet` method should instead be
written in a similar fashion to `orc`:
```
def parquet(path: String): DataFrame = format("parquet").load(path)
```
Author: Nathan Howell <[email protected]>
Closes #7947 from NathanHowell/SPARK-9618 and squashes the following commits:
d1ea62c [Nathan Howell] [SPARK-9618] [SQL] Use the specified schema when
reading Parquet files
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eb8bfa3e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eb8bfa3e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eb8bfa3e
Branch: refs/heads/master
Commit: eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf
Parents: 70112ff
Author: Nathan Howell <[email protected]>
Authored: Wed Aug 5 22:16:56 2015 +0800
Committer: Cheng Lian <[email protected]>
Committed: Wed Aug 5 22:16:56 2015 +0800
----------------------------------------------------------------------
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/eb8bfa3e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index eb09807..b90de8e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -260,7 +260,7 @@ class DataFrameReader private[sql](sqlContext: SQLContext)
extends Logging {
sqlContext.baseRelationToDataFrame(
new ParquetRelation(
- globbedPaths.map(_.toString), None, None,
extraOptions.toMap)(sqlContext))
+ globbedPaths.map(_.toString), userSpecifiedSchema, None,
extraOptions.toMap)(sqlContext))
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]