Depending on the use case, creating separate prepared statements for each
combination of set / unset values in large INSERT/UPDATE statements may be
prohibitive.

Instead, you can look into driver level support for UNSET values.  Requires
Cassandra 2.2 or later IIRC.

See:
Java Driver:
https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
Python Driver:
https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
Node Driver:
https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset

On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <sean_r_dur...@homedepot.com>
wrote:

> You say the events are incremental updates. I am interpreting this to mean
> only some columns are updated. Others should keep their original values.
>
> You are correct that inserting null creates a tombstone.
>
> Can you only insert the columns that actually have new values? Just skip
> the columns with no information. (Make the insert generator a bit smarter.)
>
> Create table happening (id text primary key, event text, a text, b text, c
> text);
> Insert into table happening (id, event, a, b, c) values ("MainEvent","The
> most complete info we have right now","Priceless","10 pm","Grand Ballroom");
> -- b changes
> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>
>
> Sean Durity
>
>
> -----Original Message-----
> From: Tomas Bartalos <tomas.barta...@gmail.com>
> Sent: Thursday, December 27, 2018 9:27 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>
> Hello,
>
> I’d start with describing my use case and how I’d like to use Cassandra to
> solve my storage needs.
> We're processing a stream of events for various happenings. Every event
> have a unique happening_id.
> One happening may have many events, usually ~ 20-100 events. I’d like to
> store only the latest event for the same happening (Event is an incremental
> update and it contains all up-to date data about happening).
> Technically the events are streamed from Kafka, processed with Spark an
> saved to Cassandra.
> In Cassandra we use upserts (insert with same primary key).  So far so
> good, however there comes the tombstone...
>
> When I’m inserting field with NULL value, Cassandra creates tombstone for
> this field. As I understood this is due to space efficiency, Cassandra
> doesn’t have to remember there is a NULL value, she just deletes the
> respective column and a delete creates a ... tombstone.
> I was hoping there could be an option to tell Cassandra not to be so space
> effective and store “unset" info without generating tombstones.
> Something similar to inserting empty strings instead of null values:
>
> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>
> Possible solutions:
> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low
> value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore
> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not
> good since this will never overwrite previously inserted event field with
> “empty” one.
> 3. On inserts with spark, find all NULL values and replace them with
> “empty” equivalent (empty string for text, 0 for integer). Very inefficient
> and problematic to find “empty” equivalent for some data types.
>
> Until tombstones appeared Cassandra was the right fit for our use case,
> however now I’m not sure if we’re heading the right direction.
> Could you please give me some advice how to solve this problem ?
>
> Thank you,
> Tomas
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> ________________________________
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Reply via email to