Re: Understanding Iceberg's dependency configuration

suds Thu, 10 Oct 2019 14:41:42 -0700

Thanks for quick reply ,
we are using iceberg artifacts directly and not using
iceberg-spark-runtime. looks like iceberg-spark-runtime should fix the
issues I am seeing.
( to be frank I need to rampup on new gradle changes in iceberg project ,
previous gradle setup was familiar and easier to follow :) )


I will give it a try using iceberg-spark-runtime and share update.

--
Thanks

On Thu, Oct 10, 2019 at 2:29 PM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi, thanks for reporting this.
>
> We updated Jackson to 2.9.10 earlier today because of a vulnerability in
> Jackson. See
> https://github.com/apache/incubator-iceberg/commit/cbefc10b5b3fd4f8e4f92251b439eab3b64ea14d
>
> How are you using Iceberg with Spark 2.4? I thought that the change should
> be safe because the recommended way to integrate with Spark is to use the
> iceberg-spark-runtime Jar that shades and relocates Jackson. That way, it
> doesn't conflict with versions used by Spark. Are you using the
> iceberg-spark-runtime build?
>
> rb
>
> On Thu, Oct 10, 2019 at 2:24 PM suds <sudssf2...@gmail.com> wrote:
>
>> I am also seeing issues when using master branch with spark v 2.4.0+
>> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala
>> module 2.8.8 requires Jackson Databind version >= 2.8.0 and < 2.9.0
>> at
>> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:66)
>> at
>> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:18)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:730)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
>> ... 24 more
>>
>> or
>> Caused by: com.fasterxml.jackson.databind.JsonMappingException:
>> Incompatible Jackson version: 2.7.9
>> at
>> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
>> at
>> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:730)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
>> ... 24 more
>>
>> from another thread there was discussion about shaded jar
>>
>> Iceberg should provided shaded Jars to make it easy to get started with
>> Spark. We also want to shade Parquet, Avro, and others to ensure that
>> Iceberg's dependencies can be updated without conflicting with what
>> Spark uses. Libraries like slf4j-api should be fine to exclude because they
>> change rarely, though.
>>
>> What are best practices when deploying iceberg in EMR? do we need to
>> create shaded jar with all avro, parquet, jackson dependencies?
>>
>>
>>
>>
>>
>> On Sat, Jun 22, 2019 at 10:23 PM RD <rdsr...@gmail.com> wrote:
>>
>>> Hi Iceberg devs,
>>>
>>> I see that guava and slf4j-api are compileOnly dependencies. This
>>> implies that they are not required at runtime and will not be resolved when
>>> resolving Iceberg artifacts. So it might very well be the case that, say
>>> for example, for iceberg-spark, the guava dependency that could be used
>>> would be coming from Spark itself which could very well be different from
>>> what we intended.
>>>
>>> I think these should be changed to compile as these are required
>>> dependencies, thoughts?
>>>
>>> Today, iceberg-runtime and iceberg-presto-runtime artifacts will not
>>> include these dependencies as they are declared as compileonly and we have
>>> configured shadow tasks to pick dependencies from "shadow" configuration.
>>>
>>> I think these slf4j and guava should be part of these iceberg runtime
>>> artifacts, no?
>>>
>>> Also, iceberg-[presto]-runtime reconfigure/recreates "shadow"
>>> configuration
>>> https://imperceptiblethoughts.com/shadow/configuration/#configuring-the-runtime-classpath.
>>> This configuration is reserved by Shadow task to add transitive
>>> dependencies which are not to be bundled in the fat jar.
>>>
>>> I think that we should not recreate "shadow" configuration and use
>>> standard runtime/compile configuration for shadow task to use.
>>>
>>> My last comment is what is the expected/recommended way to use Iceberg
>>> artifacts in a runtime say Spark. Should thin jars with transitive
>>> dependencies be used or an Iceberg runtime with shaded dependencies [most
>>> common dependencies which could conflict e.g guava, avro] be used?
>>>
>>> -R
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Understanding Iceberg's dependency configuration

Reply via email to