nastra commented on a change in pull request #2826:
URL: https://github.com/apache/iceberg/pull/2826#discussion_r678045366



##########
File path: build.gradle
##########
@@ -247,13 +250,30 @@ project(':iceberg-data') {
       exclude group: 'org.slf4j', module: 'slf4j-log4j12'
     }
 
-    testCompile("org.apache.hadoop:hadoop-client") {
+    implementation("org.apache.orc:orc-core::nohive") {

Review comment:
       > This adds implementation dependencies for org.apache.avro:avro and 
org.apache.orc:orc-core. Why isn't there one for org.apache.parquet?
   
   Judging by the 3 parquet imports that are being used in `:iceberg-data`, we 
have:
   
   ```
   import org.apache.parquet.hadoop.ParquetFileReader;
   import org.apache.parquet.hadoop.metadata.ParquetMetadata;
   ```
   These 2 are satisfied by `org.apache.parquet:parquet-hadoop`. and then 
`import org.apache.parquet.Preconditions` (which should probably be replaced 
with `org.apache.iceberg.relocated.com.google.common.base.Preconditions`), 
which is being satisfied by `org.apache.parquet:parquet-common`. Both of these 
libs are part of `org.apache.parquet:parquet-avro` as can be seen below:
   
   ```
   
   \--- org.apache.parquet:parquet-avro -> 1.12.0
        +--- org.apache.parquet:parquet-column:1.12.0
        |    +--- org.apache.parquet:parquet-common:1.12.0
        |    |    +--- org.apache.parquet:parquet-format-structures:1.12.0
        |    |    |    +--- org.slf4j:slf4j-api:1.7.22 -> 1.7.25
        |    |    |    \--- javax.annotation:javax.annotation-api:1.3.2
        |    |    +--- org.slf4j:slf4j-api:1.7.22 -> 1.7.25
        |    |    \--- org.apache.yetus:audience-annotations:0.12.0
        |    \--- org.apache.parquet:parquet-encoding:1.12.0
        |         \--- org.apache.parquet:parquet-common:1.12.0 (*)
        +--- org.apache.parquet:parquet-hadoop:1.12.0
        |    +--- org.apache.parquet:parquet-column:1.12.0 (*)
        |    +--- org.apache.parquet:parquet-format-structures:1.12.0 (*)
        |    +--- org.apache.parquet:parquet-jackson:1.12.0
        |    +--- org.xerial.snappy:snappy-java:1.1.8
        |    +--- commons-pool:commons-pool:1.6
        |    \--- com.github.luben:zstd-jni:1.4.9-1
        \--- org.apache.parquet:parquet-format-structures:1.12.0 (*)
   ```
   Thus `org.apache.parquet:parquet-avro` carries everything we need for 
`:iceberg-data` in this case. Does that answer your question?
   
   
   > Can you be more specific about what you're saying with implementation vs 
compile? When are transitive dependencies included and when are they not?
   
   You could technically just replace the old `compile` configuration with the 
new `api` configuration, but according to  [Gradle 
docs](https://docs.gradle.org/current/userguide/java_library_plugin.html#java_library_plugin)
 it is better to use `implementation`, since compilation will be faster and 
other dependencies don't **leak** into your classpath during compilation. Below 
is the section from the Gradle docs that I'm referring to:
   
   > Dependencies appearing in the api configurations will be transitively 
exposed to consumers of the library, and as such will appear on the compile 
classpath of consumers. Dependencies found in the implementation configuration 
will, on the other hand, not be exposed to consumers, and therefore not leak 
into the consumers' compile classpath. This comes with several benefits:
   
   >* dependencies do not leak into the compile classpath of consumers anymore, 
so you will never accidentally depend on a transitive dependency
   
   >* faster compilation thanks to reduced classpath size
   
   less recompilations when implementation dependencies change: consumers would 
not need to be recompiled
   
   cleaner publishing: when used in conjunction with the new maven-publish 
plugin, Java libraries produce POM files that distinguish exactly between what 
is required to compile against the library and what is required to use the 
library at runtime (in other words, don’t mix what is needed to compile the 
library itself and what is needed to compile against the library).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to