Repository: spark
Updated Branches:
refs/heads/branch-1.5 8b00c0690 -> d5f788121
[SPARK-9618] [SQL] Use the specified schema when reading Parquet files
The user specified schema is currently ignored when loading Parquet files.
One workaround is to use the `format` and `load` methods instead of `parquet`,
e.g.:
```
val schema = ???
// schema is ignored
sqlContext.read.schema(schema).parquet("hdfs:///test")
// schema is retained
sqlContext.read.schema(schema).format("parquet").load("hdfs:///test")
```
The fix is simple, but I wonder if the `parquet` method should instead be
written in a similar fashion to `orc`:
```
def parquet(path: String): DataFrame = format("parquet").load(path)
```
Author: Nathan Howell <[email protected]>
Closes #7947 from NathanHowell/SPARK-9618 and squashes the following commits:
d1ea62c [Nathan Howell] [SPARK-9618] [SQL] Use the specified schema when
reading Parquet files
(cherry picked from commit eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf)
Signed-off-by: Reynold Xin <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d5f78812
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d5f78812
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d5f78812
Branch: refs/heads/branch-1.5
Commit: d5f788121bebc3266e961d2e9042fe9a4049c8a4
Parents: 8b00c06
Author: Nathan Howell <[email protected]>
Authored: Wed Aug 5 22:16:56 2015 +0800
Committer: Reynold Xin <[email protected]>
Committed: Thu Aug 6 13:19:17 2015 -0700
----------------------------------------------------------------------
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/d5f78812/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index eb09807..b90de8e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -260,7 +260,7 @@ class DataFrameReader private[sql](sqlContext: SQLContext)
extends Logging {
sqlContext.baseRelationToDataFrame(
new ParquetRelation(
- globbedPaths.map(_.toString), None, None,
extraOptions.toMap)(sqlContext))
+ globbedPaths.map(_.toString), userSpecifiedSchema, None,
extraOptions.toMap)(sqlContext))
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]