[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010648#comment-18010648
 ] 

Francisco Guerrero commented on CASSANALYTICS-80:
-------------------------------------------------

This is something I've encountered as well. Kryo is never used for 
serialization for the bulk reader path. I think it's a valid option to consider 
removing the code if it never is exercised. 

> Kryo serialization for CassandraDataLayer is never used by Spark due to 
> broadcast
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANALYTICS-80
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-80
>             Project: Apache Cassandra Analytics
>          Issue Type: Improvement
>            Reporter: Liu Cao
>            Priority: Normal
>
> To reproduce, simply run any integration test for the bulk reader:
>  # Change the logging setting or the logging level of [this 
> line|https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L759]
>  to INFO
>  # Run the following
> ./gradlew cassandra-analytics-integration-tests:test --tests 
> "org.apache.cassandra.analytics.BulkReaderTest.testUsingSingleSidecarContactPoint"
>  --debug > test.out
>  
> We can see in the test output that
> ```
> Falling back to JDK serialization.
> ```
> And there is no sign of of the log line 
> [here|https://github.com/apache/cassandra-analytics/blob/a6bbbfa8689bd84705943b96444e8d8151376e27/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L828C26-L828C66]
>  (Serializing CassandraDataLayer with Kryo) that should have appeared if Kyro 
> were used.
>  
> After setting up the debugger, I have confirmed that the spark session was 
> properly configrued with Kryo - I evaluated the spark session config at 
> runtime to confirm the proper register and serializer was set up. This also 
> showed up in the logs:
> ```
> INFO KryoRegister: Setting kryo registrators: 
> org.apache.cassandra.spark.bulkwriter.util.SbwKryoRegistrator,org.apache.cassandra.spark.KryoRegister
> ```
> After stepping through the code, the issue seems to be that the 
> TorrentBroadcast in spark decided to use the JDK serialization anyway.
>  
> Has anyone else tested out the kryo serilization setup? If Kryo is never used 
> maybe we shouldn't bother maintaining the complexity
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to