Liu Cao created CASSANALYTICS-80: ------------------------------------ Summary: Kryo serialization for CassandraDataLayer is never used by Spark due to broadcast Key: CASSANALYTICS-80 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-80 Project: Apache Cassandra Analytics Issue Type: Improvement Reporter: Liu Cao
To reproduce, simply run any integration test for the bulk reader: # Change the logging setting or the logging level of [this line|https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L759] to INFO # Run the following ./gradlew cassandra-analytics-integration-tests:test --tests "org.apache.cassandra.analytics.BulkReaderTest.testUsingSingleSidecarContactPoint" --debug > test.out We can see in the test output that ``` Falling back to JDK serialization. ``` And there is no sign of of the log line [here|https://github.com/apache/cassandra-analytics/blob/a6bbbfa8689bd84705943b96444e8d8151376e27/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L828C26-L828C66] (Serializing CassandraDataLayer with Kryo) that should have appeared if Kyro were used. After setting up the debugger, I have confirmed that the spark session was properly configrued with Kryo - I evaluated the spark session config at runtime to confirm the properly register and serializer was set up. This also showed up in the logs: ``` INFO KryoRegister: Setting kryo registrators: org.apache.cassandra.spark.bulkwriter.util.SbwKryoRegistrator,org.apache.cassandra.spark.KryoRegister ``` After stepping through the code, the issue seems to be that the TorrentBroadcast in spark decided to use the JDK serialization anyway. Has anyone else tested out the kryo serilization setup? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org