[ https://issues.apache.org/jira/browse/CASSANALYTICS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010648#comment-18010648 ]
Francisco Guerrero commented on CASSANALYTICS-80: ------------------------------------------------- This is something I've encountered as well. Kryo is never used for serialization for the bulk reader path. I think it's a valid option to consider removing the code if it never is exercised. > Kryo serialization for CassandraDataLayer is never used by Spark due to > broadcast > --------------------------------------------------------------------------------- > > Key: CASSANALYTICS-80 > URL: https://issues.apache.org/jira/browse/CASSANALYTICS-80 > Project: Apache Cassandra Analytics > Issue Type: Improvement > Reporter: Liu Cao > Priority: Normal > > To reproduce, simply run any integration test for the bulk reader: > # Change the logging setting or the logging level of [this > line|https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L759] > to INFO > # Run the following > ./gradlew cassandra-analytics-integration-tests:test --tests > "org.apache.cassandra.analytics.BulkReaderTest.testUsingSingleSidecarContactPoint" > --debug > test.out > > We can see in the test output that > ``` > Falling back to JDK serialization. > ``` > And there is no sign of of the log line > [here|https://github.com/apache/cassandra-analytics/blob/a6bbbfa8689bd84705943b96444e8d8151376e27/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L828C26-L828C66] > (Serializing CassandraDataLayer with Kryo) that should have appeared if Kyro > were used. > > After setting up the debugger, I have confirmed that the spark session was > properly configrued with Kryo - I evaluated the spark session config at > runtime to confirm the proper register and serializer was set up. This also > showed up in the logs: > ``` > INFO KryoRegister: Setting kryo registrators: > org.apache.cassandra.spark.bulkwriter.util.SbwKryoRegistrator,org.apache.cassandra.spark.KryoRegister > ``` > After stepping through the code, the issue seems to be that the > TorrentBroadcast in spark decided to use the JDK serialization anyway. > > Has anyone else tested out the kryo serilization setup? If Kryo is never used > maybe we shouldn't bother maintaining the complexity > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org