[jira] [Work logged] (BEAM-5164) ParquetIOIT fails on Spark and Flink

ASF GitHub Bot (JIRA) Wed, 14 Aug 2019 09:19:22 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-5164?focusedWorklogId=294827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294827
 ]


ASF GitHub Bot logged work on BEAM-5164:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Aug/19 16:18
            Start Date: 14/Aug/19 16:18
    Worklog Time Spent: 10m 
      Work Description: RyanSkraba commented on pull request #9339: 
[BEAM-5164]: Add documentation for ParquetIO.
URL: https://github.com/apache/beam/pull/9339#discussion_r313963888
 
 

 ##########
 File path: website/src/documentation/io/built-in-parquet.md
 ##########
 @@ -0,0 +1,141 @@
+---
+layout: section
+title: "Apache Parquet I/O connector"
+section_menu: section-menu/documentation.html
+permalink: /documentation/io/built-in/parquet/
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+[Built-in I/O Transforms]({{site.baseurl}}/documentation/io/built-in/)
+
+# Apache Parquet I/O connector
+
+<nav class="language-switcher">
+  <strong>Adapt for:</strong>
+  <ul>
+    <li data-type="language-java" class="active">Java SDK</li>
+    <li data-type="language-py">Python SDK</li>
+  </ul>
+</nav>
+
+The Beam SDKs include built-in transforms that can read data from and write 
data
+to [Apache Parquet](https://parquet.apache.org) files.
+
+## Before you start
+
+<!-- Java specific -->
+
+{:.language-java}
+To use ParquetIO, add the Maven artifact dependency to your `pom.xml` file.
+
+```java
+<dependency>
+    <groupId>org.apache.beam</groupId>
+    <artifactId>beam-sdks-java-io-parquet</artifactId>
+    <version>{{ site.release_latest }}</version>
+</dependency>
+```
+
+{:.language-java}
+Additional resources:
+
+{:.language-java}
+* [ParquetIO source 
code](https://github.com/apache/beam/blob/master/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java)
+* [ParquetIO Javadoc](https://beam.apache.org/releases/javadoc/{{ 
site.release_latest }}/org/apache/beam/sdk/io/parquet/ParquetIO.html)
+
+<!-- Python specific -->
+
+{:.language-py}
+ParquetIO comes preinstalled with the Apache Beam python sdk..
+
+{:.language-py}
+Additional resources:
+
+{:.language-py}
+* [ParquetIO source 
code](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio.py)
+* [ParquetIO Pydoc](https://beam.apache.org/releases/pydoc/{{ 
site.release_latest }}/apache_beam.io.parquetio.html)
+
+
+### Using ParquetIO with Spark before 2.4
 
 Review comment:
   Neat!  I didn't know about that site -- I'll take a careful look.
   
   I copied the structure from the hcatalog which has the same issue...  I'll 
fix this one (later this week unfortunately) and apply it to hcatalog in the 
future.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 294827)
    Time Spent: 40m  (was: 0.5h)

> ParquetIOIT fails on Spark and Flink
> ------------------------------------
>
>                 Key: BEAM-5164
>                 URL: https://issues.apache.org/jira/browse/BEAM-5164
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Priority: Minor
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When run on Spark or Flink remote cluster, ParquetIOIT fails with the 
> following stacktrace: 
> {code:java}
> org.apache.beam.sdk.io.parquet.ParquetIOIT > writeThenReadAll FAILED
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:66)
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:87)
> at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:116)
> at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:61)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
> at 
> org.apache.beam.sdk.io.parquet.ParquetIOIT.writeThenReadAll(ParquetIOIT.java:133)
> Caused by:
> java.lang.NoSuchMethodError: 
> org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work logged] (BEAM-5164) ParquetIOIT fails on Spark and Flink

Reply via email to