tsekityam opened a new issue, #15320:
URL: https://github.com/apache/pinot/issues/15320
## What did I do
1. Checkout the code at #15315
The code is for authorization, have nothing to do with the issue
mentioned in this report
2. Build `pinot-spark-3-connector` with `./mvnw clean package -DskipTests
-Papache-release -pl pinot-connectors/pinot-spark-3-connector/ -am`
I built the jar with this dev container
```json
{
"name": "Java",
"image": "mcr.microsoft.com/devcontainers/java:1-17-bookworm",
"features": {
"ghcr.io/devcontainers/features/java:1": {
"version": "none",
"installMaven": "true",
"installGradle": "false"
}
}
}
```
3. Create a databricks cluster with databricks runtime 15.4 LTS (includes
Apache Spark 3.5.0, Scala 2.12)
Need to set env variable `JNAME=zulu17-ca-amd64` to enable Java 17 in
the cluster
3. Upload the following jar we just built to databricks cluster
* pinot-spark-3-connector-1.4.0-SNAPSHOT.jar
* pinot-spark-common-1.4.0-SNAPSHOT.jar
* pinot-spi-1.4.0-SNAPSHOT.jar
3. On top of the above jars, install the following dependencies on the
cluster
- org.apache.httpcomponents.client5:httpclient5:5.4.2
- io.circe:circe-generic_2.12:0.14.12
4. Run the following python code with the cluster
```py
df = (
spark
.read
.format("pinot")
.option("controller", "xxxx")
.option("broker", "xxxx")
.option("table", "xxxx")
.option("tableType", "offline")
.option("authorization", "xxxx")
.load()
)
display(df)
```
## What did I see
```
Py4JJavaError: An error occurred while calling t.addCustomDisplayData.
: java.lang.NoClassDefFoundError: Could not initialize class
io.circe.Decoder$
at
org.apache.pinot.connector.spark.common.PinotClusterClient$.$anonfun$getRoutingTableForQuery$1(PinotClusterClient.scala:201)
at scala.util.Try$.apply(Try.scala:213)
at
org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:197)
at
org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTable(PinotClusterClient.scala:150)
at
org.apache.pinot.connector.spark.v3.datasource.PinotScan.planInputPartitions(PinotScan.scala:57)
{skipped}
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ExceptionInInitializerError: Exception
java.lang.NoSuchMethodError: 'void
cats.kernel.CommutativeSemigroup.$init$(cats.kernel.CommutativeSemigroup)' [in
thread "Thread-125"]
at cats.UnorderedFoldable$$anon$1.<init>(UnorderedFoldable.scala:131)
{skipped}
```
## What did I expect
The pinot data was displayed without error
## What went wrong
The error I saw is very similar to
https://github.com/typelevel/cats/issues/3628, so I think they are related.
Look like the `cats` in the cluster system path is conflicted with the `cats`
required by `circe-generic`
## How to fix this issue
I am going to raise a PR to remove `circe-generic` from the spark connector
module. `circe-generic` here is being used to decode response from pinot API.
We can replace it by `jackson`, which is already being used in other pinot
module. No additional dependency will be added to the project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]