Depending on the use case, creating separate prepared statements for each combination of set / unset values in large INSERT/UPDATE statements may be prohibitive.
Instead, you can look into driver level support for UNSET values. Requires Cassandra 2.2 or later IIRC. See: Java Driver: https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding Python Driver: https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values Node Driver: https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <sean_r_dur...@homedepot.com> wrote: > You say the events are incremental updates. I am interpreting this to mean > only some columns are updated. Others should keep their original values. > > You are correct that inserting null creates a tombstone. > > Can you only insert the columns that actually have new values? Just skip > the columns with no information. (Make the insert generator a bit smarter.) > > Create table happening (id text primary key, event text, a text, b text, c > text); > Insert into table happening (id, event, a, b, c) values ("MainEvent","The > most complete info we have right now","Priceless","10 pm","Grand Ballroom"); > -- b changes > Insert into happening (id, b) values ("MainEvent","9:30 pm"); > > > Sean Durity > > > -----Original Message----- > From: Tomas Bartalos <tomas.barta...@gmail.com> > Sent: Thursday, December 27, 2018 9:27 AM > To: user@cassandra.apache.org > Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values > > Hello, > > I’d start with describing my use case and how I’d like to use Cassandra to > solve my storage needs. > We're processing a stream of events for various happenings. Every event > have a unique happening_id. > One happening may have many events, usually ~ 20-100 events. I’d like to > store only the latest event for the same happening (Event is an incremental > update and it contains all up-to date data about happening). > Technically the events are streamed from Kafka, processed with Spark an > saved to Cassandra. > In Cassandra we use upserts (insert with same primary key). So far so > good, however there comes the tombstone... > > When I’m inserting field with NULL value, Cassandra creates tombstone for > this field. As I understood this is due to space efficiency, Cassandra > doesn’t have to remember there is a NULL value, she just deletes the > respective column and a delete creates a ... tombstone. > I was hoping there could be an option to tell Cassandra not to be so space > effective and store “unset" info without generating tombstones. > Something similar to inserting empty strings instead of null values: > > CREATE TABLE happening (id text PRIMARY KEY, event text); insert into > happening (‘1’, ‘event1’); — tombstone is generated insert into happening > (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’); > > Possible solutions: > 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low > value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore > NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not > good since this will never overwrite previously inserted event field with > “empty” one. > 3. On inserts with spark, find all NULL values and replace them with > “empty” equivalent (empty string for text, 0 for integer). Very inefficient > and problematic to find “empty” equivalent for some data types. > > Until tombstones appeared Cassandra was the right fit for our use case, > however now I’m not sure if we’re heading the right direction. > Could you please give me some advice how to solve this problem ? > > Thank you, > Tomas > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > ________________________________ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org >