Konstantin Shaposhnikov created SPARK-6566:
----------------------------------------------
Summary: Update Spark to use the latest version of Parquet
libraries
Key: SPARK-6566
URL: https://issues.apache.org/jira/browse/SPARK-6566
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.3.0
Reporter: Konstantin Shaposhnikov
There are a lot of bug fixes in the latest version of parquet (1.6.0rc7). E.g.
PARQUET-136
It would be good to update Spark to use the latest parquet version.
The following changes are required:
{code}
diff --git a/pom.xml b/pom.xml
index 5ad39a9..095b519 100644
--- a/pom.xml
+++ b/pom.xml
@@ -132,7 +132,7 @@
<!-- Version used for internal directory structure -->
<hive.version.short>0.13.1</hive.version.short>
<derby.version>10.10.1.1</derby.version>
- <parquet.version>1.6.0rc3</parquet.version>
+ <parquet.version>1.6.0rc7</parquet.version>
<jblas.version>1.2.3</jblas.version>
<jetty.version>8.1.14.v20131031</jetty.version>
<orbit.version>3.0.0.v201112011016</orbit.version>
{code}
and
{code}
---
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
@@ -480,7 +480,7 @@ private[parquet] class FilteringParquetRowInputFormat
globalMetaData = new GlobalMetaData(globalMetaData.getSchema,
mergedMetadata, globalMetaData.getCreatedBy)
- val readContext = getReadSupport(configuration).init(
+ val readContext =
ParquetInputFormat.getReadSupportInstance(configuration).init(
new InitContext(configuration,
globalMetaData.getKeyValueMetaData,
globalMetaData.getSchema))
{code}
I am happy to prepare a pull request if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]