GitHub user laserson opened a pull request: https://github.com/apache/incubator-spark/pull/576
Added parquetFileAsJSON to read Parquet data into JSON strings This function makes it incredibly easy to read Parquet data especially with PySpark. Is there any interest in this? It requires pulling in some Parquet dependencies, and adding some Parquet jars to SPARK_CLASSPATH. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark pyspark-parquet Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/576.patch ---- commit 2e1969e33da97253eb3dccf51e54afb469ed9fd5 Author: Uri Laserson <laser...@cloudera.com> Date: 2014-02-10T01:28:08Z Added parquetFileAsJSON to read Parquet data into JSON strings ----