Hi Sanal, This was quite an unknown territory for me as well, Debezium connector was implemented in such a way that it loaded the schema, but the implementation of the handler has not seen any updates which happened after the schema was loaded. Debezium connector is quite special because it runs as a standalone program (different jvm) so if you go and change your schema on your node, changes are applied in the context of Cassandra JVM, but Debezium connector does not know anything about it because it was not notified about that at all. The obvious result of that was that if you detected a new commit log file to process, it would see that its "cdc_enabled" is false, because the fact whether a table is cdc enabled or not is not serialised and part of Mutation. It is somewhere is table metadata in PartitionUpdate or similar, but from connector's point of view it was never changed. This is a little bit harder concept to grasp so feel free to go over this mentally multiple times.
Because of the complexity of this problem, I wrote a document for the Debezium team to fully understand what is going on, you can read more about it in depth here (1) and here (2). So, I load schemas only on connectors startup, but after that, I need to be somehow notified what changes have happened in Cassandra JVM so I can act accordingly in the connector. The solution I came up with is that I implemented a schema change listener in driver which reacts to changes done in Cassandra and I apply it to my "local", "connectors" Cassandra stuff just for having schemas updated in "connectors jvm" and metadata would contain changes I am interested in. If you somehow manage to run your connector in the same JVM as Cassandra runs, I think you would not have this kind of problem. I guess the same would hold if you run your handler as an JVM agent to Cassandra. (1) https://github.com/debezium/debezium-connector-cassandra/blob/ac43b7797c084c3e67cedde3662af1e58de8a4c2/REPORT.adoc (2) https://github.com/debezium/debezium-connector-cassandra/blob/ac43b7797c084c3e67cedde3662af1e58de8a4c2/REPORT_2.adoc On Wed, 4 May 2022 at 14:30, Sanal Vasudevan <get2sa...@gmail.com> wrote: > > Hi Stefan, > > First of all, many thanks for responding to my email. > Let me explain my journey so far with this. I could not find any > documentation for this, so it is good to have someone to discuss this :) > > The program which I had earlier for version 3.9 did the following: > 3.9: > Config.setClientMode(true); > > Porting to 3.11, I used the following: > DatabaseDescriptor.clientInitialization(); > > Now with 4.0, when I use DatabaseDescriptor.clientInitialization(), it throws > up an error leading something as follows: > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.config.DatabaseDescriptor.getMaxMutationSize(DatabaseDescriptor.java:1959) > at org.apache.cassandra.db.IMutation.<clinit>(IMutation.java:29) > ... 3 more > > Then I tried > DatabaseDescriptor.daemonInitialization() > with system property, -Dcassandra.config=file:///path/to/cassandra.yaml > > After this, it errored out for property cassandra.storagedir not set. I set > this to a dummy value, > System.setProperty("cassandra.storagedir","/tmp"); > > With this, I was able to run the standalone program without errors but I was > not able to read mutations from user tables. > After loading Schema using Schema.instance.load(keyspace), I was able to read > mutations from the commit logs. > > I looked at the code that you've implemented, I have some questions: > 1) For Cassandra 3 and Cassandra 4, you have used > DatabaseDescriptor.toolInitialization() > May I ask if external applications should always use > DatabaseDescriptor.toolInitialization() ? > > 2) In your code, keyspace metadata (table metadata and column metadata) is > not constructed and loaded into the Schema instance. > You are using Schema.instance.loadFromDisk(false) > Is this the preferred way to load the schema? > > I will try out your approach and get back soon. > > Again, many thanks. > > Best regards > Sanal > > On Wed, May 4, 2022 at 2:44 PM Stefan Miklosovic > <stefan.mikloso...@instaclustr.com> wrote: >> >> Hi Sanal, >> >> I have recently updated a project called Debezium and its Cassandra >> connector to work with Cassandra 4 (1) >> >> The implementation of CommitLogReadHandler is here (2) >> >> (1) https://github.com/debezium/debezium-connector-cassandra >> (2) >> https://github.com/debezium/debezium-connector-cassandra/blob/main/cassandra-4/src/main/java/io/debezium/connector/cassandra/Cassandra4CommitLogReadHandlerImpl.java >> >> Feel free to reach me privately or here on ML if you have any specific >> questions. >> >> Regards >> >> Stefan >> >> On Wed, 4 May 2022 at 01:40, Sanal Vasudevan <get2sa...@gmail.com> wrote: >> > >> > Hi Folks, >> > >> > I have a standalone Java application that implements the interface >> > CommitLogReadHandler to read cassandra commit log files generated by >> > Cassandra 3.11. >> > I recently tried to use this to read the commit logs generated by >> > Cassandra 4, but it does not work. >> > Has anyone tried to implement CommitLogReadHandler for Cassandra 4 or is >> > there a better way to read/parse Cassandra 4 commit logs? >> > Any help would be appreciated. >> > >> > Thanks! >> > >> > Best regards >> > Sanal > > > > -- > Sanal Vasudevan Nair