[I] [spark connector] dependency conflict [pinot]

via GitHub Wed, 19 Mar 2025 23:36:08 -0700


tsekityam opened a new issue, #15320:
URL: https://github.com/apache/pinot/issues/15320


   ## What did I do
   
   1. Checkout the code at #15315
   
         The code is for authorization, have nothing to do with the issue 
mentioned in this report
     
   2. Build `pinot-spark-3-connector` with `./mvnw clean package -DskipTests 
-Papache-release -pl pinot-connectors/pinot-spark-3-connector/ -am`
   
       I built the jar with this dev container
   
       ```json
       {
        "name": "Java",
        "image": "mcr.microsoft.com/devcontainers/java:1-17-bookworm",
       
        "features": {
                "ghcr.io/devcontainers/features/java:1": {
                        "version": "none",
                        "installMaven": "true",
                        "installGradle": "false"
                }
        }    
       }
       ```
   
   3. Create a databricks cluster with databricks runtime 15.4 LTS (includes 
Apache Spark 3.5.0, Scala 2.12)
   
       Need to set env variable `JNAME=zulu17-ca-amd64` to enable Java 17 in 
the cluster
   
   3. Upload the following jar we just built to databricks cluster
   
       * pinot-spark-3-connector-1.4.0-SNAPSHOT.jar
       * pinot-spark-common-1.4.0-SNAPSHOT.jar
       * pinot-spi-1.4.0-SNAPSHOT.jar
   
   3. On top of the above jars, install the following dependencies on the 
cluster
   
       - org.apache.httpcomponents.client5:httpclient5:5.4.2
       - io.circe:circe-generic_2.12:0.14.12
   
   4. Run the following python code with the cluster 
   
       ```py
       df = (
           spark
           .read
           .format("pinot")
           .option("controller", "xxxx")
           .option("broker", "xxxx")
           .option("table", "xxxx")
           .option("tableType", "offline")
           .option("authorization", "xxxx")
           .load()
       )
       
       display(df)
       ```
   
   ## What did I see
   ```
   Py4JJavaError: An error occurred while calling t.addCustomDisplayData.
   : java.lang.NoClassDefFoundError: Could not initialize class 
io.circe.Decoder$
        at 
org.apache.pinot.connector.spark.common.PinotClusterClient$.$anonfun$getRoutingTableForQuery$1(PinotClusterClient.scala:201)
        at scala.util.Try$.apply(Try.scala:213)
        at 
org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:197)
        at 
org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTable(PinotClusterClient.scala:150)
        at 
org.apache.pinot.connector.spark.v3.datasource.PinotScan.planInputPartitions(PinotScan.scala:57)
   
   {skipped}
   
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: java.lang.ExceptionInInitializerError: Exception 
java.lang.NoSuchMethodError: 'void 
cats.kernel.CommutativeSemigroup.$init$(cats.kernel.CommutativeSemigroup)' [in 
thread "Thread-125"]
        at cats.UnorderedFoldable$$anon$1.<init>(UnorderedFoldable.scala:131)
   
   {skipped}
   
   ```
   
   ## What did I expect
   
   The pinot data was displayed without error
   
   ## What went wrong
   
   The error I saw is very similar to 
https://github.com/typelevel/cats/issues/3628, so I think they are related. 
Look like the `cats` in the cluster system path is conflicted with the `cats` 
required by `circe-generic`
   
   ## How to fix this issue
   
   I am going to raise a PR to remove `circe-generic` from the spark connector 
module. `circe-generic` here is being used to decode response from pinot API. 
We can replace it by `jackson`, which is already being used in other pinot 
module. No additional dependency will be added to the project.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [spark connector] dependency conflict [pinot]

Reply via email to