JingGe commented on a change in pull request #19083:
URL: https://github.com/apache/flink/pull/19083#discussion_r826874274



##########
File path: docs/content/docs/connectors/datastream/formats/parquet.md
##########
@@ -39,46 +39,71 @@ To use the format you need to add the Flink Parquet 
dependency to your project:
        <version>{{< version >}}</version>
 </dependency>
 ```
- 
+
+For reading Avro records, parquet-avro dependency is required additionally:
+
+```xml
+<dependency>
+    <groupId>org.apache.parquet</groupId>
+    <artifactId>parquet-avro</artifactId>
+    <version>${flink.format.parquet.version}</version>
+    <optional>true</optional>
+    <exclusions>
+        <exclusion>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-client</artifactId>
+        </exclusion>
+        <exclusion>
+            <groupId>it.unimi.dsi</groupId>
+            <artifactId>fastutil</artifactId>
+        </exclusion>
+    </exclusions>

Review comment:
       Thanks for the suggestion. Beyond the reasons you mentioned above, what 
I understood additionally with the exclusions is that` flink-parquet` does not 
call any APIs and does not consume any functionality of the both excluded 
direct dependency `hadoop-client` and `fastutil`. Therefore, it is up to the 
user whether define them as provided or any other scopes in their own pom.xml. 
The Flink document just points out how to define the dependency of 
`parquet-avro`. In this case, with exclusions makes more sense. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to