Liu Cao created CASSANALYTICS-80:
------------------------------------

             Summary: Kryo serialization for CassandraDataLayer is never used 
by Spark due to broadcast
                 Key: CASSANALYTICS-80
                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-80
             Project: Apache Cassandra Analytics
          Issue Type: Improvement
            Reporter: Liu Cao


To reproduce, simply run any integration test for the bulk reader:
 # Change the logging setting or the logging level of [this 
line|https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L759]
 to INFO
 # Run the following

./gradlew cassandra-analytics-integration-tests:test --tests 
"org.apache.cassandra.analytics.BulkReaderTest.testUsingSingleSidecarContactPoint"
 --debug > test.out

 

We can see in the test output that

```
Falling back to JDK serialization.
```

And there is no sign of of the log line 
[here|https://github.com/apache/cassandra-analytics/blob/a6bbbfa8689bd84705943b96444e8d8151376e27/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L828C26-L828C66]
 (Serializing CassandraDataLayer with Kryo) that should have appeared if Kyro 
were used.

 

After setting up the debugger, I have confirmed that the spark session was 
properly configrued with Kryo - I evaluated the spark session config at runtime 
to confirm the properly register and serializer was set up. This also showed up 
in the logs:

```
INFO KryoRegister: Setting kryo registrators: 
org.apache.cassandra.spark.bulkwriter.util.SbwKryoRegistrator,org.apache.cassandra.spark.KryoRegister
```

After stepping through the code, the issue seems to be that the 
TorrentBroadcast in spark decided to use the JDK serialization anyway.

 

Has anyone else tested out the kryo serilization setup?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to