picarro-sdivakar opened a new issue, #13691:
URL: https://github.com/apache/iceberg/issues/13691

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug šŸž
   
   I’m trying to use the Kafka Connect Iceberg sink connector built from the 
latest main branch (v1.10.0-SNAPSHOT), following the instructions in the 
[iceberg-kafka-connect-runtime](https://github.com/apache/iceberg/tree/main/kafka-connect/kafka-connect-runtime)
 module.
   
   After unzipping the built connector JARs into the Kafka Connect plugin path 
and starting a sink connector with type=hadoop catalog, the connector fails 
with the following error:
   
   `java.lang.NoClassDefFoundError: 
org/apache/commons/configuration2/Configuration
        at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:43)
        ...
   Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.configuration2.Configuration
        at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
   `
   and 
   
   `java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hadoop.security.UserGroupInformation
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
        ...
   `
   Steps to Reproduce
   
   1. Checkout https://github.com/apache/iceberg and build the connector by 
   ./gradlew build -x test -x integrationTest
   
   2. Copied the zip output from 
build/distributions/iceberg-kafka-connect-runtime-1.10.0-SNAPSHOT.zip to Kafka 
Connect plugin path and unzipped it.
   3. Started the connector by submitting `{
     "name": "iceberg-sink",
     "config": {
       "connector.class": "org.apache.iceberg.connect.IcebergSinkConnector",
       "tasks.max": "1",
       "topics": "company.product.conc.level2",
   
       "iceberg.catalog.name": "hadoop",
       "iceberg.catalog.type": "hadoop",
       "iceberg.catalog.warehouse": "s3a://domain/product/iceberg/",
       "iceberg.catalog.hadoop.fs.s3a.endpoint": "<>:443",
       "iceberg.catalog.hadoop.fs.s3a.access.key": "<>",
       "iceberg.catalog.hadoop.fs.s3a.secret.key": "<>",
       "iceberg.catalog.hadoop.fs.s3a.path.style.access": "<>",
       "iceberg.catalog.hadoop.fs.s3a.impl": 
"org.apache.hadoop.fs.s3a.S3AFileSystem",
   
       "flush.size": "500",
       "key.converter": "org.apache.kafka.connect.storage.StringConverter",
       "value.converter": "com.company.cdp.avroconverter.CustomAvroConverter",
       "value.converter.avro.schema.file": "/etc/kafka-connect/data.avsc",
       "value.converter.schemas.enable": "true",
       "iceberg.tables": "db.processed_concentration",
       
       "iceberg.table.schema.file": "/etc/kafka-connect/data.avsc"
     }
   }
   `
   Analysis
   It seems that the commons-configuration2 transitive dependency required by 
org.apache.hadoop.security.UserGroupInformation is not bundled into the 
connector distribution. Since Kafka Connect uses isolated classloading per 
plugin, these dependencies must be explicitly included.
   
   Suggested Fix
   Could we consider either:
   
   Adding the missing commons-configuration2 dependency (and any other Hadoop 
transitive dependencies required at runtime) to the runtimeClasspath of 
iceberg-kafka-connect-runtime, or
   
   Publishing a shaded uber-JAR (similar to how some other connectors are 
bundled), or
   
   Updating documentation to guide users to manually include required Hadoop 
dependencies in the plugin path.
   
   Please advise on the preferred direction. Happy to contribute a patch if 
needed.
   
   
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to