Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad
Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I beleive your approach is the same as using spark with "
>>>>> spark.cassandra.output.ignoreNulls=true"
>>>>> This will not cover the situation when a value have to be overwriten
>>>>> with null.
>>>>>
>>>>> I found one possible solution - change the schema to keep only primary
>>>>> key fields and move all other fields to frozen UDT.
>>>>> create table (year, month, day, id, frozen, primary key((year,
>>>>> month, day), id) )
>>>>> In this way anything that is null inside event doesn't create
>>>>> tombstone, since event is serialized to BLOB.
>>>>> The penalty is in need of deserializing the whole Event when selecting
>>>>> only few columns.
>>>>> Can anyone confirm if this is good solution performance wise?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan >>>>
>>>>>> "The problem is I can't know the combination of set/unset values" -->
>>>>>> Just for this requirement, Achilles has a working solution for many years
>>>>>> using INSERT_NOT_NULL_FIELDS strategy:
>>>>>>
>>>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>>>>>
>>>>>> Or you can use the Update API that by design only perform update on
>>>>>> not null fields:
>>>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>>>>>
>>>>>>
>>>>>> Behind the scene, for each new combination of INSERT INTO
>>>>>> table(x,y,z) statement, Achilles will check its prepared statement cache
>>>>>> and if the statement does not exist yet, create a new prepared statement
>>>>>> and put it into the cache for later re-use for you
>>>>>>
>>>>>> Disclaiment: I'm the creator of Achilles
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>>>> tomas.barta...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> The problem is I can't know the combination of set/unset values.
>>>>>>> From my perspective every value should be set. The event from Kafka
>>>>>>> represents the complete state of the happening at certain point in 
>>>>>>> time. In
>>>>>>> my table I want to store the latest event so the most recent state of 
>>>>>>> the
>>>>>>> happening (in this table I don't care about the history). Actually I 
>>>>>>> used
>>>>>>> wrong expression since its just the opposite of "incremental update", 
>>>>>>> every
>>>>>>> event carries all data (state) for specific point of time.
>>>>>>>
>>>>>>> The event is represented with nested json structure. Top level
>>>>>>> elements of the json are table fields with type like text, boolean,
>>>>>>> timestamp, list and the nested elements are UDT fields.
>>>>>>>
>>>>>>> Simplified example:
>>>>>>> There is a new purchase for the happening, event:
>>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>>>> I don't know what actually happened for this event, maybe there is a
>>>>>>> new item purchased, maybe some customer info have been changed, maybe 
>>>>>>> the
>>>>>>> specials have been revoked and I have to reset them. I just need to 
>>>>>>> store
>>>>>>> the state as it artived from Kafka, there might already be an event for
>>>>>>> this happening saved before, or maybe this is the first one.
>>>>>>>
>>>>>>> BR,
>>>>>>> Tomas
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>>>>
>>

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Tomas Bartalos
> day), id) )
> In this way anything that is null inside event doesn't create tombstone, 
> since event is serialized to BLOB.
> The penalty is in need of deserializing the whole Event when selecting only 
> few columns. 
> Can anyone confirm if this is good solution performance wise?
> 
> Thank you, 
> 
> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan  <mailto:doanduy...@gmail.com> wrote:
> "The problem is I can't know the combination of set/unset values" --> Just 
> for this requirement, Achilles has a working solution for many years using 
> INSERT_NOT_NULL_FIELDS strategy:
> 
> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy 
> <https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy>
> 
> Or you can use the Update API that by design only perform update on not null 
> fields: 
> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>  
> <https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity>
> 
> 
> Behind the scene, for each new combination of INSERT INTO table(x,y,z) 
> statement, Achilles will check its prepared statement cache and if the 
> statement does not exist yet, create a new prepared statement and put it into 
> the cache for later re-use for you
> 
> Disclaiment: I'm the creator of Achilles
> 
> 
> 
> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos  <mailto:tomas.barta...@gmail.com>> wrote:
> Hello,
> 
> The problem is I can't know the combination of set/unset values. From my 
> perspective every value should be set. The event from Kafka represents the 
> complete state of the happening at certain point in time. In my table I want 
> to store the latest event so the most recent state of the happening (in this 
> table I don't care about the history). Actually I used wrong expression since 
> its just the opposite of "incremental update", every event carries all data 
> (state) for specific point of time. 
> 
> The event is represented with nested json structure. Top level elements of 
> the json are table fields with type like text, boolean, timestamp, list and 
> the nested elements are UDT fields. 
> 
> Simplified example:
> There is a new purchase for the happening, event:
> {total_amount: 50, items : [A, B, C, new_item], purchase_time : '2018-12-27 
> 13:30', specials: null, customer : {... }, fare_amount,...} 
> I don't know what actually happened for this event, maybe there is a new item 
> purchased, maybe some customer info have been changed, maybe the specials 
> have been revoked and I have to reset them. I just need to store the state as 
> it artived from Kafka, there might already be an event for this happening 
> saved before, or maybe this is the first one.
> 
> BR,
> Tomas
> 
> 
> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens  <mailto:migh...@gmail.com> wrote:
> Depending on the use case, creating separate prepared statements for each 
> combination of set / unset values in large INSERT/UPDATE statements may be 
> prohibitive.  
> 
> Instead, you can look into driver level support for UNSET values.  Requires 
> Cassandra 2.2 or later IIRC.
> 
> See:
> Java Driver: 
> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>  
> <https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding>
> Python Driver: 
> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>  
> <https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values>
> Node Driver: 
> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>  
> <https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset>
> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R  <mailto:sean_r_dur...@homedepot.com>> wrote:
> You say the events are incremental updates. I am interpreting this to mean 
> only some columns are updated. Others should keep their original values.
> 
> You are correct that inserting null creates a tombstone.
> 
> Can you only insert the columns that actually have new values? Just skip the 
> columns with no information. (Make the insert generator a bit smarter.)
> 
> Create table happening (id text primary key, event text, a text, b text, c 
> text);
> Insert into table happening (id, event, a, b, c) values ("MainEvent","The 
> most complete info we have right now","Priceless","10 pm","Grand Ballroo

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
 using INSERT_NOT_NULL_FIELDS strategy:
>>>>>
>>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>>>>>
>>>>> Or you can use the Update API that by design only perform update on
>>>>> not null fields:
>>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>>>>>
>>>>>
>>>>> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
>>>>> statement, Achilles will check its prepared statement cache and if the
>>>>> statement does not exist yet, create a new prepared statement and put it
>>>>> into the cache for later re-use for you
>>>>>
>>>>> Disclaiment: I'm the creator of Achilles
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <
>>>>> tomas.barta...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> The problem is I can't know the combination of set/unset values. From
>>>>>> my perspective every value should be set. The event from Kafka represents
>>>>>> the complete state of the happening at certain point in time. In my 
>>>>>> table I
>>>>>> want to store the latest event so the most recent state of the happening
>>>>>> (in this table I don't care about the history). Actually I used wrong
>>>>>> expression since its just the opposite of "incremental update", every 
>>>>>> event
>>>>>> carries all data (state) for specific point of time.
>>>>>>
>>>>>> The event is represented with nested json structure. Top level
>>>>>> elements of the json are table fields with type like text, boolean,
>>>>>> timestamp, list and the nested elements are UDT fields.
>>>>>>
>>>>>> Simplified example:
>>>>>> There is a new purchase for the happening, event:
>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>>> I don't know what actually happened for this event, maybe there is a
>>>>>> new item purchased, maybe some customer info have been changed, maybe the
>>>>>> specials have been revoked and I have to reset them. I just need to store
>>>>>> the state as it artived from Kafka, there might already be an event for
>>>>>> this happening saved before, or maybe this is the first one.
>>>>>>
>>>>>> BR,
>>>>>> Tomas
>>>>>>
>>>>>>
>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>>>
>>>>>>> Depending on the use case, creating separate prepared statements for
>>>>>>> each combination of set / unset values in large INSERT/UPDATE statements
>>>>>>> may be prohibitive.
>>>>>>>
>>>>>>> Instead, you can look into driver level support for UNSET values.
>>>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>>>
>>>>>>> See:
>>>>>>> Java Driver:
>>>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>>>>>> Python Driver:
>>>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>>>>>> Node Driver:
>>>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>>>>>
>>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>>>> sean_r_dur...@homedepot.com> wrote:
>>>>>>>
>>>>>>>> You say the events are incremental updates. I am interpreting this
>>>>>>>> to mean only some columns are updated. Others should keep their 
>>>>>>>> original
>>>>>>>> values.
>>>>>>>>
>>>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>>>
>>>>>>>> Can you only insert the columns that actually have new values? Just
>&g

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
il.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> The problem is I can't know the combination of set/unset values. From
>>>>> my perspective every value should be set. The event from Kafka represents
>>>>> the complete state of the happening at certain point in time. In my table 
>>>>> I
>>>>> want to store the latest event so the most recent state of the happening
>>>>> (in this table I don't care about the history). Actually I used wrong
>>>>> expression since its just the opposite of "incremental update", every 
>>>>> event
>>>>> carries all data (state) for specific point of time.
>>>>>
>>>>> The event is represented with nested json structure. Top level
>>>>> elements of the json are table fields with type like text, boolean,
>>>>> timestamp, list and the nested elements are UDT fields.
>>>>>
>>>>> Simplified example:
>>>>> There is a new purchase for the happening, event:
>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>>>>> I don't know what actually happened for this event, maybe there is a
>>>>> new item purchased, maybe some customer info have been changed, maybe the
>>>>> specials have been revoked and I have to reset them. I just need to store
>>>>> the state as it artived from Kafka, there might already be an event for
>>>>> this happening saved before, or maybe this is the first one.
>>>>>
>>>>> BR,
>>>>> Tomas
>>>>>
>>>>>
>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>>
>>>>>> Depending on the use case, creating separate prepared statements for
>>>>>> each combination of set / unset values in large INSERT/UPDATE statements
>>>>>> may be prohibitive.
>>>>>>
>>>>>> Instead, you can look into driver level support for UNSET values.
>>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>>
>>>>>> See:
>>>>>> Java Driver:
>>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>>>>> Python Driver:
>>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>>>>> Node Driver:
>>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>>>>
>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>>> sean_r_dur...@homedepot.com> wrote:
>>>>>>
>>>>>>> You say the events are incremental updates. I am interpreting this
>>>>>>> to mean only some columns are updated. Others should keep their original
>>>>>>> values.
>>>>>>>
>>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>>
>>>>>>> Can you only insert the columns that actually have new values? Just
>>>>>>> skip the columns with no information. (Make the insert generator a bit
>>>>>>> smarter.)
>>>>>>>
>>>>>>> Create table happening (id text primary key, event text, a text, b
>>>>>>> text, c text);
>>>>>>> Insert into table happening (id, event, a, b, c) values
>>>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>>>> pm","Grand Ballroom");
>>>>>>> -- b changes
>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>>>>>
>>>>>>>
>>>>>>> Sean Durity
>>>>>>>
>>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: Tomas Bartalos 
>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>>>> To: user@cassandra.apache.org
>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>>>>>
>>>>>>> Hello,
>>>>>>>
&g

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
 event, maybe there is a
>>>> new item purchased, maybe some customer info have been changed, maybe the
>>>> specials have been revoked and I have to reset them. I just need to store
>>>> the state as it artived from Kafka, there might already be an event for
>>>> this happening saved before, or maybe this is the first one.
>>>>
>>>> BR,
>>>> Tomas
>>>>
>>>>
>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >>>
>>>>> Depending on the use case, creating separate prepared statements for
>>>>> each combination of set / unset values in large INSERT/UPDATE statements
>>>>> may be prohibitive.
>>>>>
>>>>> Instead, you can look into driver level support for UNSET values.
>>>>> Requires Cassandra 2.2 or later IIRC.
>>>>>
>>>>> See:
>>>>> Java Driver:
>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>>>> Python Driver:
>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>>>> Node Driver:
>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>>>
>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>>> sean_r_dur...@homedepot.com> wrote:
>>>>>
>>>>>> You say the events are incremental updates. I am interpreting this to
>>>>>> mean only some columns are updated. Others should keep their original
>>>>>> values.
>>>>>>
>>>>>> You are correct that inserting null creates a tombstone.
>>>>>>
>>>>>> Can you only insert the columns that actually have new values? Just
>>>>>> skip the columns with no information. (Make the insert generator a bit
>>>>>> smarter.)
>>>>>>
>>>>>> Create table happening (id text primary key, event text, a text, b
>>>>>> text, c text);
>>>>>> Insert into table happening (id, event, a, b, c) values
>>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>>> pm","Grand Ballroom");
>>>>>> -- b changes
>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>>>>
>>>>>>
>>>>>> Sean Durity
>>>>>>
>>>>>>
>>>>>> -Original Message-
>>>>>> From: Tomas Bartalos 
>>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>>> To: user@cassandra.apache.org
>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I’d start with describing my use case and how I’d like to use
>>>>>> Cassandra to solve my storage needs.
>>>>>> We're processing a stream of events for various happenings. Every
>>>>>> event have a unique happening_id.
>>>>>> One happening may have many events, usually ~ 20-100 events. I’d like
>>>>>> to store only the latest event for the same happening (Event is an
>>>>>> incremental update and it contains all up-to date data about happening).
>>>>>> Technically the events are streamed from Kafka, processed with Spark
>>>>>> an saved to Cassandra.
>>>>>> In Cassandra we use upserts (insert with same primary key).  So far
>>>>>> so good, however there comes the tombstone...
>>>>>>
>>>>>> When I’m inserting field with NULL value, Cassandra creates tombstone
>>>>>> for this field. As I understood this is due to space efficiency, 
>>>>>> Cassandra
>>>>>> doesn’t have to remember there is a NULL value, she just deletes the
>>>>>> respective column and a delete creates a ... tombstone.
>>>>>> I was hoping there could be an option to tell Cassandra not to be so
>>>>>> space effective and store “unset" info without generating tombstones.
>>>>>> Something similar to inserting empty strings instead of null values:
>>>>>>
>>>>>> CREAT

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
atures/datatypes/nulls/#unset
>>>>
>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>>> sean_r_dur...@homedepot.com> wrote:
>>>>
>>>>> You say the events are incremental updates. I am interpreting this to
>>>>> mean only some columns are updated. Others should keep their original
>>>>> values.
>>>>>
>>>>> You are correct that inserting null creates a tombstone.
>>>>>
>>>>> Can you only insert the columns that actually have new values? Just
>>>>> skip the columns with no information. (Make the insert generator a bit
>>>>> smarter.)
>>>>>
>>>>> Create table happening (id text primary key, event text, a text, b
>>>>> text, c text);
>>>>> Insert into table happening (id, event, a, b, c) values
>>>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>>>> pm","Grand Ballroom");
>>>>> -- b changes
>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>>>
>>>>>
>>>>> Sean Durity
>>>>>
>>>>>
>>>>> -Original Message-
>>>>> From: Tomas Bartalos 
>>>>> Sent: Thursday, December 27, 2018 9:27 AM
>>>>> To: user@cassandra.apache.org
>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>>>
>>>>> Hello,
>>>>>
>>>>> I’d start with describing my use case and how I’d like to use
>>>>> Cassandra to solve my storage needs.
>>>>> We're processing a stream of events for various happenings. Every
>>>>> event have a unique happening_id.
>>>>> One happening may have many events, usually ~ 20-100 events. I’d like
>>>>> to store only the latest event for the same happening (Event is an
>>>>> incremental update and it contains all up-to date data about happening).
>>>>> Technically the events are streamed from Kafka, processed with Spark
>>>>> an saved to Cassandra.
>>>>> In Cassandra we use upserts (insert with same primary key).  So far so
>>>>> good, however there comes the tombstone...
>>>>>
>>>>> When I’m inserting field with NULL value, Cassandra creates tombstone
>>>>> for this field. As I understood this is due to space efficiency, Cassandra
>>>>> doesn’t have to remember there is a NULL value, she just deletes the
>>>>> respective column and a delete creates a ... tombstone.
>>>>> I was hoping there could be an option to tell Cassandra not to be so
>>>>> space effective and store “unset" info without generating tombstones.
>>>>> Something similar to inserting empty strings instead of null values:
>>>>>
>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
>>>>> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
>>>>> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>>>>>
>>>>> Possible solutions:
>>>>> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable
>>>>> low value (1 hour ?) . Not good, since phantom data may re-appear 2. 
>>>>> ignore
>>>>> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not
>>>>> good since this will never overwrite previously inserted event field with
>>>>> “empty” one.
>>>>> 3. On inserts with spark, find all NULL values and replace them with
>>>>> “empty” equivalent (empty string for text, 0 for integer). Very 
>>>>> inefficient
>>>>> and problematic to find “empty” equivalent for some data types.
>>>>>
>>>>> Until tombstones appeared Cassandra was the right fit for our use
>>>>> case, however now I’m not sure if we’re heading the right direction.
>>>>> Could you please give me some advice how to solve this problem ?
>>>>>
>>>>> Thank you,
>>>>> Tomas
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>> The information in this Internet Email is confidential and may be
>>>>> legally privileged. It is intended solely for the addressee. Access to 
>>>>> this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter. 
>>>>> The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>> content of this attachment and for any damages or losses arising from any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>
>>>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
Hello,

I beleive your approach is the same as using spark with "
spark.cassandra.output.ignoreNulls=true"
This will not cover the situation when a value have to be overwriten with
null.

I found one possible solution - change the schema to keep only primary key
fields and move all other fields to frozen UDT.
create table (year, month, day, id, frozen, primary key((year,
month, day), id) )
In this way anything that is null inside event doesn't create tombstone,
since event is serialized to BLOB.
The penalty is in need of deserializing the whole Event when selecting only
few columns.
Can anyone confirm if this is good solution performance wise?

Thank you,

On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan  "The problem is I can't know the combination of set/unset values" --> Just
> for this requirement, Achilles has a working solution for many years using
> INSERT_NOT_NULL_FIELDS strategy:
>
> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
>
> Or you can use the Update API that by design only perform update on not
> null fields:
> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity
>
>
> Behind the scene, for each new combination of INSERT INTO table(x,y,z)
> statement, Achilles will check its prepared statement cache and if the
> statement does not exist yet, create a new prepared statement and put it
> into the cache for later re-use for you
>
> Disclaiment: I'm the creator of Achilles
>
>
>
> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
> wrote:
>
>> Hello,
>>
>> The problem is I can't know the combination of set/unset values. From my
>> perspective every value should be set. The event from Kafka represents the
>> complete state of the happening at certain point in time. In my table I
>> want to store the latest event so the most recent state of the happening
>> (in this table I don't care about the history). Actually I used wrong
>> expression since its just the opposite of "incremental update", every event
>> carries all data (state) for specific point of time.
>>
>> The event is represented with nested json structure. Top level elements
>> of the json are table fields with type like text, boolean, timestamp, list
>> and the nested elements are UDT fields.
>>
>> Simplified example:
>> There is a new purchase for the happening, event:
>> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
>> I don't know what actually happened for this event, maybe there is a new
>> item purchased, maybe some customer info have been changed, maybe the
>> specials have been revoked and I have to reset them. I just need to store
>> the state as it artived from Kafka, there might already be an event for
>> this happening saved before, or maybe this is the first one.
>>
>> BR,
>> Tomas
>>
>>
>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens >
>>> Depending on the use case, creating separate prepared statements for
>>> each combination of set / unset values in large INSERT/UPDATE statements
>>> may be prohibitive.
>>>
>>> Instead, you can look into driver level support for UNSET values.
>>> Requires Cassandra 2.2 or later IIRC.
>>>
>>> See:
>>> Java Driver:
>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>>> Python Driver:
>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>>> Node Driver:
>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>>
>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>>> sean_r_dur...@homedepot.com> wrote:
>>>
>>>> You say the events are incremental updates. I am interpreting this to
>>>> mean only some columns are updated. Others should keep their original
>>>> values.
>>>>
>>>> You are correct that inserting null creates a tombstone.
>>>>
>>>> Can you only insert the columns that actually have new values? Just
>>>> skip the columns with no information. (Make the insert generator a bit
>>>> smarter.)
>>>>
>>>> Create table happening (id text primary key, event text, a text, b
>>>> text, c text);
>>>> Insert into table happening (id, event, a, b, c) values
>>>> ("MainEvent","The most complete info we have right now","Priceless","10

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
"The problem is I can't know the combination of set/unset values" --> Just
for this requirement, Achilles has a working solution for many years using
INSERT_NOT_NULL_FIELDS strategy:

https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy

Or you can use the Update API that by design only perform update on not
null fields:
https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity


Behind the scene, for each new combination of INSERT INTO table(x,y,z)
statement, Achilles will check its prepared statement cache and if the
statement does not exist yet, create a new prepared statement and put it
into the cache for later re-use for you

Disclaiment: I'm the creator of Achilles



On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos 
wrote:

> Hello,
>
> The problem is I can't know the combination of set/unset values. From my
> perspective every value should be set. The event from Kafka represents the
> complete state of the happening at certain point in time. In my table I
> want to store the latest event so the most recent state of the happening
> (in this table I don't care about the history). Actually I used wrong
> expression since its just the opposite of "incremental update", every event
> carries all data (state) for specific point of time.
>
> The event is represented with nested json structure. Top level elements of
> the json are table fields with type like text, boolean, timestamp, list and
> the nested elements are UDT fields.
>
> Simplified example:
> There is a new purchase for the happening, event:
> {total_amount: 50, items : [A, B, C, new_item], purchase_time :
> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...}
> I don't know what actually happened for this event, maybe there is a new
> item purchased, maybe some customer info have been changed, maybe the
> specials have been revoked and I have to reset them. I just need to store
> the state as it artived from Kafka, there might already be an event for
> this happening saved before, or maybe this is the first one.
>
> BR,
> Tomas
>
>
> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens 
>> Depending on the use case, creating separate prepared statements for each
>> combination of set / unset values in large INSERT/UPDATE statements may be
>> prohibitive.
>>
>> Instead, you can look into driver level support for UNSET values.
>> Requires Cassandra 2.2 or later IIRC.
>>
>> See:
>> Java Driver:
>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
>> Python Driver:
>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
>> Node Driver:
>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>>
>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>>> You say the events are incremental updates. I am interpreting this to
>>> mean only some columns are updated. Others should keep their original
>>> values.
>>>
>>> You are correct that inserting null creates a tombstone.
>>>
>>> Can you only insert the columns that actually have new values? Just skip
>>> the columns with no information. (Make the insert generator a bit smarter.)
>>>
>>> Create table happening (id text primary key, event text, a text, b text,
>>> c text);
>>> Insert into table happening (id, event, a, b, c) values
>>> ("MainEvent","The most complete info we have right now","Priceless","10
>>> pm","Grand Ballroom");
>>> -- b changes
>>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>>
>>>
>>> Sean Durity
>>>
>>>
>>> -Original Message-
>>> From: Tomas Bartalos 
>>> Sent: Thursday, December 27, 2018 9:27 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>>
>>> Hello,
>>>
>>> I’d start with describing my use case and how I’d like to use Cassandra
>>> to solve my storage needs.
>>> We're processing a stream of events for various happenings. Every event
>>> have a unique happening_id.
>>> One happening may have many events, usually ~ 20-100 events. I’d like to
>>> store only the latest event for the same happening (Event is an incremental
>>> update and it contains all up-to date data about happening).
>>> Technically the

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Tomas Bartalos
Hello,

The problem is I can't know the combination of set/unset values. From my
perspective every value should be set. The event from Kafka represents the
complete state of the happening at certain point in time. In my table I
want to store the latest event so the most recent state of the happening
(in this table I don't care about the history). Actually I used wrong
expression since its just the opposite of "incremental update", every event
carries all data (state) for specific point of time.

The event is represented with nested json structure. Top level elements of
the json are table fields with type like text, boolean, timestamp, list and
the nested elements are UDT fields.

Simplified example:
There is a new purchase for the happening, event:
{total_amount: 50, items : [A, B, C, new_item], purchase_time : '2018-12-27
13:30', specials: null, customer : {... }, fare_amount,...}
I don't know what actually happened for this event, maybe there is a new
item purchased, maybe some customer info have been changed, maybe the
specials have been revoked and I have to reset them. I just need to store
the state as it artived from Kafka, there might already be an event for
this happening saved before, or maybe this is the first one.

BR,
Tomas


On Thu, 27 Dec 2018, 9:36 pm Eric Stevens  Depending on the use case, creating separate prepared statements for each
> combination of set / unset values in large INSERT/UPDATE statements may be
> prohibitive.
>
> Instead, you can look into driver level support for UNSET values.
> Requires Cassandra 2.2 or later IIRC.
>
> See:
> Java Driver:
> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
> Python Driver:
> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
> Node Driver:
> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset
>
> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> You say the events are incremental updates. I am interpreting this to
>> mean only some columns are updated. Others should keep their original
>> values.
>>
>> You are correct that inserting null creates a tombstone.
>>
>> Can you only insert the columns that actually have new values? Just skip
>> the columns with no information. (Make the insert generator a bit smarter.)
>>
>> Create table happening (id text primary key, event text, a text, b text,
>> c text);
>> Insert into table happening (id, event, a, b, c) values ("MainEvent","The
>> most complete info we have right now","Priceless","10 pm","Grand Ballroom");
>> -- b changes
>> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>>
>>
>> Sean Durity
>>
>>
>> -Original Message-
>> From: Tomas Bartalos 
>> Sent: Thursday, December 27, 2018 9:27 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>>
>> Hello,
>>
>> I’d start with describing my use case and how I’d like to use Cassandra
>> to solve my storage needs.
>> We're processing a stream of events for various happenings. Every event
>> have a unique happening_id.
>> One happening may have many events, usually ~ 20-100 events. I’d like to
>> store only the latest event for the same happening (Event is an incremental
>> update and it contains all up-to date data about happening).
>> Technically the events are streamed from Kafka, processed with Spark an
>> saved to Cassandra.
>> In Cassandra we use upserts (insert with same primary key).  So far so
>> good, however there comes the tombstone...
>>
>> When I’m inserting field with NULL value, Cassandra creates tombstone for
>> this field. As I understood this is due to space efficiency, Cassandra
>> doesn’t have to remember there is a NULL value, she just deletes the
>> respective column and a delete creates a ... tombstone.
>> I was hoping there could be an option to tell Cassandra not to be so
>> space effective and store “unset" info without generating tombstones.
>> Something similar to inserting empty strings instead of null values:
>>
>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
>> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
>> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>>
>> Possible solutions:
>> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low
>> value (1 hour ?) 

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Eric Stevens
Depending on the use case, creating separate prepared statements for each
combination of set / unset values in large INSERT/UPDATE statements may be
prohibitive.

Instead, you can look into driver level support for UNSET values.  Requires
Cassandra 2.2 or later IIRC.

See:
Java Driver:
https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding
Python Driver:
https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values
Node Driver:
https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset

On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R 
wrote:

> You say the events are incremental updates. I am interpreting this to mean
> only some columns are updated. Others should keep their original values.
>
> You are correct that inserting null creates a tombstone.
>
> Can you only insert the columns that actually have new values? Just skip
> the columns with no information. (Make the insert generator a bit smarter.)
>
> Create table happening (id text primary key, event text, a text, b text, c
> text);
> Insert into table happening (id, event, a, b, c) values ("MainEvent","The
> most complete info we have right now","Priceless","10 pm","Grand Ballroom");
> -- b changes
> Insert into happening (id, b) values ("MainEvent","9:30 pm");
>
>
> Sean Durity
>
>
> -Original Message-
> From: Tomas Bartalos 
> Sent: Thursday, December 27, 2018 9:27 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
>
> Hello,
>
> I’d start with describing my use case and how I’d like to use Cassandra to
> solve my storage needs.
> We're processing a stream of events for various happenings. Every event
> have a unique happening_id.
> One happening may have many events, usually ~ 20-100 events. I’d like to
> store only the latest event for the same happening (Event is an incremental
> update and it contains all up-to date data about happening).
> Technically the events are streamed from Kafka, processed with Spark an
> saved to Cassandra.
> In Cassandra we use upserts (insert with same primary key).  So far so
> good, however there comes the tombstone...
>
> When I’m inserting field with NULL value, Cassandra creates tombstone for
> this field. As I understood this is due to space efficiency, Cassandra
> doesn’t have to remember there is a NULL value, she just deletes the
> respective column and a delete creates a ... tombstone.
> I was hoping there could be an option to tell Cassandra not to be so space
> effective and store “unset" info without generating tombstones.
> Something similar to inserting empty strings instead of null values:
>
> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into
> happening (‘1’, ‘event1’); — tombstone is generated insert into happening
> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’);
>
> Possible solutions:
> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low
> value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore
> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not
> good since this will never overwrite previously inserted event field with
> “empty” one.
> 3. On inserts with spark, find all NULL values and replace them with
> “empty” equivalent (empty string for text, 0 for integer). Very inefficient
> and problematic to find “empty” equivalent for some data types.
>
> Until tombstones appeared Cassandra was the right fit for our use case,
> however now I’m not sure if we’re heading the right direction.
> Could you please give me some advice how to solve this problem ?
>
> Thank you,
> Tomas
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for th

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R
You say the events are incremental updates. I am interpreting this to mean only 
some columns are updated. Others should keep their original values.

You are correct that inserting null creates a tombstone.

Can you only insert the columns that actually have new values? Just skip the 
columns with no information. (Make the insert generator a bit smarter.)

Create table happening (id text primary key, event text, a text, b text, c 
text);
Insert into table happening (id, event, a, b, c) values ("MainEvent","The most 
complete info we have right now","Priceless","10 pm","Grand Ballroom");
-- b changes
Insert into happening (id, b) values ("MainEvent","9:30 pm");


Sean Durity


-Original Message-
From: Tomas Bartalos 
Sent: Thursday, December 27, 2018 9:27 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values

Hello,

I’d start with describing my use case and how I’d like to use Cassandra to 
solve my storage needs.
We're processing a stream of events for various happenings. Every event have a 
unique happening_id.
One happening may have many events, usually ~ 20-100 events. I’d like to store 
only the latest event for the same happening (Event is an incremental update 
and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an saved 
to Cassandra.
In Cassandra we use upserts (insert with same primary key).  So far so good, 
however there comes the tombstone...

When I’m inserting field with NULL value, Cassandra creates tombstone for this 
field. As I understood this is due to space efficiency, Cassandra doesn’t have 
to remember there is a NULL value, she just deletes the respective column and a 
delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space 
effective and store “unset" info without generating tombstones.
Something similar to inserting empty strings instead of null values:

CREATE TABLE happening (id text PRIMARY KEY, event text); insert into happening 
(‘1’, ‘event1’); — tombstone is generated insert into happening (‘1’, null); — 
tombstone is not generated insert into happening (‘1’, '’);

Possible solutions:
1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low value 
(1 hour ?) . Not good, since phantom data may re-appear 2. ignore NULLs on 
spark side with “spark.cassandra.output.ignoreNulls=true”. Not good since this 
will never overwrite previously inserted event field with “empty” one.
3. On inserts with spark, find all NULL values and replace them with “empty” 
equivalent (empty string for text, 0 for integer). Very inefficient and 
problematic to find “empty” equivalent for some data types.

Until tombstones appeared Cassandra was the right fit for our use case, however 
now I’m not sure if we’re heading the right direction.
Could you please give me some advice how to solve this problem ?

Thank you,
Tomas
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Tomas Bartalos
Hello,

I’d start with describing my use case and how I’d like to use Cassandra to 
solve my storage needs.
We're processing a stream of events for various happenings. Every event have a 
unique happening_id.
One happening may have many events, usually ~ 20-100 events. I’d like to store 
only the latest event for the same happening (Event is an incremental update 
and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an saved 
to Cassandra.
In Cassandra we use upserts (insert with same primary key).  So far so good, 
however there comes the tombstone...

When I’m inserting field with NULL value, Cassandra creates tombstone for this 
field. As I understood this is due to space efficiency, Cassandra doesn’t have 
to remember there is a NULL value, she just deletes the respective column and a 
delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space 
effective and store “unset" info without generating tombstones.
Something similar to inserting empty strings instead of null values:

CREATE TABLE happening (id text PRIMARY KEY, event text);
insert into happening (‘1’, ‘event1’);
— tombstone is generated
insert into happening (‘1’, null);
— tombstone is not generated
insert into happening (‘1’, '’);

Possible solutions:
1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low value 
(1 hour ?) . Not good, since phantom data may re-appear
2. ignore NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. 
Not good since this will never overwrite previously inserted event field with 
“empty” one.
3. On inserts with spark, find all NULL values and replace them with “empty” 
equivalent (empty string for text, 0 for integer). Very inefficient and 
problematic to find “empty” equivalent for some data types.

Until tombstones appeared Cassandra was the right fit for our use case, however 
now I’m not sure if we’re heading the right direction.
Could you please give me some advice how to solve this problem ?

Thank you,
Tomas
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: Inserting null values

2015-05-07 Thread Peer, Oded
I’ve added an option to prevent tombstone creation when using 
PreparedStatements to trunk, see CASSANDRA-7304.

The problem is having tombstones in regular columns.
When you perform a read request (range query or by PK):
- Cassandra iterates over all the cells (all, not only the cells specified in 
the query) in the relevant rows while counting tombstone cells 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199)
- creates a ColumnFamily object instance with the rows
- filters the selected columns from the internal CF 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653)
- returns the result

If you have many unnecessary tombstones you read many unnecessary cells.



From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Wednesday, May 06, 2015 4:37 PM
To: user@cassandra.apache.org
Subject: Re: Inserting null values

I agree that inserting null is not as good as not inserting that column at all 
when you have confidence that you are not shadowing any underlying data. But 
pragmatically speaking it really doesn't sound like a small number of 
incidental nulls/tombstones ( 20% of columns, otherwise CASSANDRA-3442 takes 
over) is going to have any performance impact either in your query patterns or 
in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data, then 
it stands to reason that an INSERT option containing an instruction to prevent 
tombstone creation would be an important performance optimization (and would 
also address the fact that non-null collections also generate tombstones on 
INSERT as well).  INSERT INTO ... USING no_tombstones;


 There's thresholds (log messages, etc.) which operate on tombstone counts 
 over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to 
clustering scans right?  I.E. tombstones don't count against those thresholds 
if they are not part of the clustering key column being considered for the 
non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold
(Default: 1000) The maximum number of tombstones a query can scan before 
warning.
tombstone_failure_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold
(Default: 10) The maximum number of tombstones a query can scan before 
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:
In the end, inserting a tombstone into a non-clustered column shouldn't be 
appreciably worse (if it is at all) than inserting a value instead.  Or am I 
missing something here?

There's thresholds (log messages, etc.) which operate on tombstone counts over 
a certain number, but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to 
understand conceptually?

=Rob




Re: Inserting null values

2015-05-06 Thread Eric Stevens
I agree that inserting null is not as good as not inserting that column at
all when you have confidence that you are not shadowing any underlying
data. But pragmatically speaking it really doesn't sound like a small
number of incidental nulls/tombstones ( 20% of columns, otherwise
CASSANDRA-3442 takes over) is going to have any performance impact either
in your query patterns or in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data,
then it stands to reason that an INSERT option containing an instruction to
prevent tombstone creation would be an important performance optimization
(and would also address the fact that non-null collections also generate
tombstones on INSERT as well).  INSERT INTO ... USING no_tombstones;


 There's thresholds (log messages, etc.) which operate on tombstone counts
over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to
clustering scans right?  I.E. tombstones don't count against those
thresholds if they are not part of the clustering key column being
considered for the non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶
http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold
(Default: 1000) The maximum number of tombstones a query can scan before
warning.tombstone_failure_threshold¶
http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold
(Default: 10) The maximum number of tombstones a query can scan before
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens migh...@gmail.com wrote:

 In the end, inserting a tombstone into a non-clustered column shouldn't
 be appreciably worse (if it is at all) than inserting a value instead.  Or
 am I missing something here?


 There's thresholds (log messages, etc.) which operate on tombstone counts
 over a certain number, but not on column counts over the same number.

 Given that tombstones are often smaller than data columns, sorta hard to
 understand conceptually?

 =Rob




RE: Inserting null values

2015-04-29 Thread Peer, Oded
Inserting a null value creates a tombstone. Tombstones can have major 
performance implications.
You can see the tombstones using sstable2json.
If you have a small number of records with null values this seems OK, otherwise 
I recommend using the QueryBuilder (for Java clients) and waiting for 
https://issues.apache.org/jira/browse/CASSANDRA-7304


From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Wednesday, April 29, 2015 11:37 AM
To: user@cassandra.apache.org
Subject: Inserting null values

Hi all,

I have some fields that I am storing into Cassandra, but some of them could be 
null at any given point. As there are quite a lot of them, it makes the code 
much more readable if I don’t check each one for null before adding it to the 
INSERT.

I can see a few Jiras around CQL 3 supporting inserting nulls:

https://issues.apache.org/jira/browse/CASSANDRA-3783
https://issues.apache.org/jira/browse/CASSANDRA-5648

But I have tested inserting null and it seems to work fine (when querying the 
table with cqlsh, it shows up as a red lowercase null).

Are there any obvious pitfalls to look out for that I have missed? Could it be 
a performance concern to insert a row with some nulls, as opposed to checking 
the values first and inserting the row and just omitting those columns?

Thanks!
Matt



Re: Inserting null values

2015-04-29 Thread DuyHai Doan
auto promotion mode on

The problem of NULL insert is already solved long time ago with Insert
Strategy in Achilles:
https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy

/auto promotion off

However, it's nice to see there will be a flag on the protocol side to
handle this problem

On Wed, Apr 29, 2015 at 2:27 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Have you considered adding a 'toSafe' method which checks if the item is
 null, and if so, returns a default value? E.g String too = safe(bar, ); .
 On Apr 29, 2015 3:14 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi all,



 I have some fields that I am storing into Cassandra, but some of them
 could be null at any given point. As there are quite a lot of them, it
 makes the code much more readable if I don’t check each one for null before
 adding it to the INSERT.



 I can see a few Jiras around CQL 3 supporting inserting nulls:



 https://issues.apache.org/jira/browse/CASSANDRA-3783

 https://issues.apache.org/jira/browse/CASSANDRA-5648



 But I have tested inserting null and it seems to work fine (when querying
 the table with cqlsh, it shows up as a red lowercase *null*).



 Are there any obvious pitfalls to look out for that I have missed? Could
 it be a performance concern to insert a row with some nulls, as opposed to
 checking the values first and inserting the row and just omitting those
 columns?



 Thanks!

 Matt






Re: Inserting null values

2015-04-29 Thread Ali Akhtar
Have you considered adding a 'toSafe' method which checks if the item is
null, and if so, returns a default value? E.g String too = safe(bar, ); .
On Apr 29, 2015 3:14 PM, Matthew Johnson matt.john...@algomi.com wrote:

 Hi all,



 I have some fields that I am storing into Cassandra, but some of them
 could be null at any given point. As there are quite a lot of them, it
 makes the code much more readable if I don’t check each one for null before
 adding it to the INSERT.



 I can see a few Jiras around CQL 3 supporting inserting nulls:



 https://issues.apache.org/jira/browse/CASSANDRA-3783

 https://issues.apache.org/jira/browse/CASSANDRA-5648



 But I have tested inserting null and it seems to work fine (when querying
 the table with cqlsh, it shows up as a red lowercase *null*).



 Are there any obvious pitfalls to look out for that I have missed? Could
 it be a performance concern to insert a row with some nulls, as opposed to
 checking the values first and inserting the row and just omitting those
 columns?



 Thanks!

 Matt





Inserting null values

2015-04-29 Thread Matthew Johnson
Hi all,



I have some fields that I am storing into Cassandra, but some of them could
be null at any given point. As there are quite a lot of them, it makes the
code much more readable if I don’t check each one for null before adding it
to the INSERT.



I can see a few Jiras around CQL 3 supporting inserting nulls:



https://issues.apache.org/jira/browse/CASSANDRA-3783

https://issues.apache.org/jira/browse/CASSANDRA-5648



But I have tested inserting null and it seems to work fine (when querying
the table with cqlsh, it shows up as a red lowercase *null*).



Are there any obvious pitfalls to look out for that I have missed? Could it
be a performance concern to insert a row with some nulls, as opposed to
checking the values first and inserting the row and just omitting those
columns?



Thanks!

Matt


Re: Inserting null values

2015-04-29 Thread Eric Stevens
Correct me if I'm wrong, but tombstones are only really problematic if you
have them going into clustering keys, then perform a range select on that
column, right (assuming it's not a symptom of the antipattern of
indefinitely overwriting the same value)?  I.E. you're deleting clusters
off of a partition.  A tombstone isn't any more costly, and in some ways
less costly than a normal column (it's a smaller size at rest than, say,
inserting an empty string or other default value as someone suggested).

Tombstones stay around a little longer post-compaction than other values,
so that's a downside, but they also would drop off the record as if it had
never been set on the next compaction after gc grace period.

Tombstones aren't intrinsically bad, but they can have some bad properties
in certain situations.  This doesn't strike me as one of them.  If you have
a way to avoid inserting null when you know you aren't occluding an
underlying value, that would be ideal.  But because the tombstone would sit
adjacent on disk to other values from the same insert, even if you were on
platters, the drive head is *already positioned* over the tombstone
location when it's read, because it read the prior value and subsequent
value which were written during the same insert.

In the end, inserting a tombstone into a non-clustered column shouldn't be
appreciably worse (if it is at all) than inserting a value instead.  Or am
I missing something here?

On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Thank you all for the advice!



 I have decided to use the Insert query builder (
 *com.datastax.driver.core.querybuilder.Insert*) which allows me to
 dynamically insert as many or as few columns as I need, and doesn’t require
 multiple prepared statements. Then, I will look at Ali’s suggestion – I
 will create a small helper method like ‘addToInsertIfNotNull’ and pump all
 my values into that, which will then filter out the ones that are null.
 Should keep the code nice and neat – I will feed back if I find any
 problems with this approach (but please jump in if you have already spotted
 any :)).



 Thanks!

 Matt



 *From:* Robert Wille [mailto:rwi...@fold3.com]
 *Sent:* 29 April 2015 15:16
 *To:* user@cassandra.apache.org
 *Subject:* Re: Inserting null values



 I’ve come across the same thing. I have a table with at least half a dozen
 columns that could be null, in any combination. Having a prepared statement
 for each permutation of null columns just isn’t going to happen. I don’t
 want to build custom queries each time because I have a really cool system
 of managing my queries that relies on them being prepared.



 Fortunately for me, I should have at most a handful of tombstones in each
 partition, and most of my records are written exactly once. So, I just let
 the tombstones get written and they’ll eventually get compacted out and
 life will go on.



 It’s annoying and not ideal, but what can you do?



 On Apr 29, 2015, at 2:36 AM, Matthew Johnson matt.john...@algomi.com
 wrote:



 Hi all,



 I have some fields that I am storing into Cassandra, but some of them
 could be null at any given point. As there are quite a lot of them, it
 makes the code much more readable if I don’t check each one for null before
 adding it to the INSERT.



 I can see a few Jiras around CQL 3 supporting inserting nulls:



 https://issues.apache.org/jira/browse/CASSANDRA-3783

 https://issues.apache.org/jira/browse/CASSANDRA-5648



 But I have tested inserting null and it seems to work fine (when querying
 the table with cqlsh, it shows up as a red lowercase *null*).



 Are there any obvious pitfalls to look out for that I have missed? Could
 it be a performance concern to insert a row with some nulls, as opposed to
 checking the values first and inserting the row and just omitting those
 columns?



 Thanks!

 Matt





Re: Inserting null values

2015-04-29 Thread Jonathan Haddad
Enough tombstones can inflate the size of an SSTable causing issues during
compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
no clustering key defined.

Perhaps an edge case, but worth considering.

On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens migh...@gmail.com wrote:

 Correct me if I'm wrong, but tombstones are only really problematic if you
 have them going into clustering keys, then perform a range select on that
 column, right (assuming it's not a symptom of the antipattern of
 indefinitely overwriting the same value)?  I.E. you're deleting clusters
 off of a partition.  A tombstone isn't any more costly, and in some ways
 less costly than a normal column (it's a smaller size at rest than, say,
 inserting an empty string or other default value as someone suggested).

 Tombstones stay around a little longer post-compaction than other values,
 so that's a downside, but they also would drop off the record as if it had
 never been set on the next compaction after gc grace period.

 Tombstones aren't intrinsically bad, but they can have some bad properties
 in certain situations.  This doesn't strike me as one of them.  If you have
 a way to avoid inserting null when you know you aren't occluding an
 underlying value, that would be ideal.  But because the tombstone would sit
 adjacent on disk to other values from the same insert, even if you were on
 platters, the drive head is *already positioned* over the tombstone
 location when it's read, because it read the prior value and subsequent
 value which were written during the same insert.

 In the end, inserting a tombstone into a non-clustered column shouldn't be
 appreciably worse (if it is at all) than inserting a value instead.  Or am
 I missing something here?

 On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Thank you all for the advice!



 I have decided to use the Insert query builder (
 *com.datastax.driver.core.querybuilder.Insert*) which allows me to
 dynamically insert as many or as few columns as I need, and doesn’t require
 multiple prepared statements. Then, I will look at Ali’s suggestion – I
 will create a small helper method like ‘addToInsertIfNotNull’ and pump all
 my values into that, which will then filter out the ones that are null.
 Should keep the code nice and neat – I will feed back if I find any
 problems with this approach (but please jump in if you have already spotted
 any :)).



 Thanks!

 Matt



 *From:* Robert Wille [mailto:rwi...@fold3.com]
 *Sent:* 29 April 2015 15:16
 *To:* user@cassandra.apache.org
 *Subject:* Re: Inserting null values



 I’ve come across the same thing. I have a table with at least half a
 dozen columns that could be null, in any combination. Having a prepared
 statement for each permutation of null columns just isn’t going to happen.
 I don’t want to build custom queries each time because I have a really cool
 system of managing my queries that relies on them being prepared.



 Fortunately for me, I should have at most a handful of tombstones in each
 partition, and most of my records are written exactly once. So, I just let
 the tombstones get written and they’ll eventually get compacted out and
 life will go on.



 It’s annoying and not ideal, but what can you do?



 On Apr 29, 2015, at 2:36 AM, Matthew Johnson matt.john...@algomi.com
 wrote:



 Hi all,



 I have some fields that I am storing into Cassandra, but some of them
 could be null at any given point. As there are quite a lot of them, it
 makes the code much more readable if I don’t check each one for null before
 adding it to the INSERT.



 I can see a few Jiras around CQL 3 supporting inserting nulls:



 https://issues.apache.org/jira/browse/CASSANDRA-3783

 https://issues.apache.org/jira/browse/CASSANDRA-5648



 But I have tested inserting null and it seems to work fine (when querying
 the table with cqlsh, it shows up as a red lowercase *null*).



 Are there any obvious pitfalls to look out for that I have missed? Could
 it be a performance concern to insert a row with some nulls, as opposed to
 checking the values first and inserting the row and just omitting those
 columns?



 Thanks!

 Matt







Re: Inserting null values

2015-04-29 Thread Robert Coli
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens migh...@gmail.com wrote:

 In the end, inserting a tombstone into a non-clustered column shouldn't be
 appreciably worse (if it is at all) than inserting a value instead.  Or am
 I missing something here?


There's thresholds (log messages, etc.) which operate on tombstone counts
over a certain number, but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to
understand conceptually?

=Rob


Re: Inserting null values

2015-04-29 Thread Eric Stevens
But we're talking about a single tombstone on each of a finite (small) set
of values, right?  We're not talking about INSERTs which are 99% nulls (at
least I don't think that's what Matthew was suggesting).  Unless you're
engaging in the antipattern of repeated overwrite, I'm still struggling to
see why this is worse than an equivalent number of non-tombstoned writes.
In fact from the description I don't think we're talking about these
tombstones even occluding any value at all.

 imagine a multi tb sstable w/ 99% tombstones

Let's play with this hypothetical, which doesn't seem like a probable
consequence of the original question.  You'd have to have taken enough
writes *inside* gc grace period to have even produced a multi-TB sstable to
come anywhere near this, and even then this either exceeds or comes really
close to the recommended maximum total data size per node (let alone in a
single sstable).  If you did have such an sstable, it doesn't seem very
likely to compact again inside gc grace period short of manually triggered
major compaction.

But let's assume you do that, you run cassandra stress inserting nothing
but tombstones, and kick off major compaction periodically.  If it
compacted inside gc grace period, is this worse for compaction than the
same number of non-tombstoned values (i.e. a multi-TB sstable is costly to
compact no matter what the contents)?  If it compacted outside gc grace
period, then 99% of the work is just dropping tombstones, it seems like it
would run really fast (for being an absurdly large sstable), as there would
be just 1% of the contents to actually copy over to the new sstable.

I'm still not clear on what I'm missing.  Is a tombstone more expensive to
compact than a non-tombstone?

On Wed, Apr 29, 2015 at 10:06 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Enough tombstones can inflate the size of an SSTable causing issues during
 compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
 no clustering key defined.

 Perhaps an edge case, but worth considering.

 On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens migh...@gmail.com wrote:

 Correct me if I'm wrong, but tombstones are only really problematic if
 you have them going into clustering keys, then perform a range select on
 that column, right (assuming it's not a symptom of the antipattern of
 indefinitely overwriting the same value)?  I.E. you're deleting clusters
 off of a partition.  A tombstone isn't any more costly, and in some ways
 less costly than a normal column (it's a smaller size at rest than, say,
 inserting an empty string or other default value as someone suggested).

 Tombstones stay around a little longer post-compaction than other values,
 so that's a downside, but they also would drop off the record as if it had
 never been set on the next compaction after gc grace period.

 Tombstones aren't intrinsically bad, but they can have some bad
 properties in certain situations.  This doesn't strike me as one of them.
 If you have a way to avoid inserting null when you know you aren't
 occluding an underlying value, that would be ideal.  But because the
 tombstone would sit adjacent on disk to other values from the same insert,
 even if you were on platters, the drive head is *already positioned* over
 the tombstone location when it's read, because it read the prior value and
 subsequent value which were written during the same insert.

 In the end, inserting a tombstone into a non-clustered column shouldn't
 be appreciably worse (if it is at all) than inserting a value instead.  Or
 am I missing something here?

 On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson matt.john...@algomi.com
  wrote:

 Thank you all for the advice!



 I have decided to use the Insert query builder (
 *com.datastax.driver.core.querybuilder.Insert*) which allows me to
 dynamically insert as many or as few columns as I need, and doesn’t require
 multiple prepared statements. Then, I will look at Ali’s suggestion – I
 will create a small helper method like ‘addToInsertIfNotNull’ and pump all
 my values into that, which will then filter out the ones that are null.
 Should keep the code nice and neat – I will feed back if I find any
 problems with this approach (but please jump in if you have already spotted
 any :)).



 Thanks!

 Matt



 *From:* Robert Wille [mailto:rwi...@fold3.com]
 *Sent:* 29 April 2015 15:16
 *To:* user@cassandra.apache.org
 *Subject:* Re: Inserting null values



 I’ve come across the same thing. I have a table with at least half a
 dozen columns that could be null, in any combination. Having a prepared
 statement for each permutation of null columns just isn’t going to happen.
 I don’t want to build custom queries each time because I have a really cool
 system of managing my queries that relies on them being prepared.



 Fortunately for me, I should have at most a handful of tombstones in
 each partition, and most of my records are written exactly once. So, I just
 let the tombstones

Re: Inserting null values

2015-04-29 Thread Philip Thompson
In a way, yes. A tombstone will only be removed after gc_grace iff the
compaction is sure that it contains all rows which that tombstone might
shadow. When two non-tombstone conflicting rows are compacted, it's always
just LWW.

On Wed, Apr 29, 2015 at 2:42 PM, Eric Stevens migh...@gmail.com wrote:

 But we're talking about a single tombstone on each of a finite (small) set
 of values, right?  We're not talking about INSERTs which are 99% nulls (at
 least I don't think that's what Matthew was suggesting).  Unless you're
 engaging in the antipattern of repeated overwrite, I'm still struggling to
 see why this is worse than an equivalent number of non-tombstoned writes.
 In fact from the description I don't think we're talking about these
 tombstones even occluding any value at all.

  imagine a multi tb sstable w/ 99% tombstones

 Let's play with this hypothetical, which doesn't seem like a probable
 consequence of the original question.  You'd have to have taken enough
 writes *inside* gc grace period to have even produced a multi-TB sstable
 to come anywhere near this, and even then this either exceeds or comes
 really close to the recommended maximum total data size per node (let alone
 in a single sstable).  If you did have such an sstable, it doesn't seem
 very likely to compact again inside gc grace period short of manually
 triggered major compaction.

 But let's assume you do that, you run cassandra stress inserting nothing
 but tombstones, and kick off major compaction periodically.  If it
 compacted inside gc grace period, is this worse for compaction than the
 same number of non-tombstoned values (i.e. a multi-TB sstable is costly to
 compact no matter what the contents)?  If it compacted outside gc grace
 period, then 99% of the work is just dropping tombstones, it seems like it
 would run really fast (for being an absurdly large sstable), as there would
 be just 1% of the contents to actually copy over to the new sstable.

 I'm still not clear on what I'm missing.  Is a tombstone more expensive to
 compact than a non-tombstone?

 On Wed, Apr 29, 2015 at 10:06 AM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 Enough tombstones can inflate the size of an SSTable causing issues
 during compaction (imagine a multi tb sstable w/ 99% tombstones) even if
 there's no clustering key defined.

 Perhaps an edge case, but worth considering.

 On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens migh...@gmail.com wrote:

 Correct me if I'm wrong, but tombstones are only really problematic if
 you have them going into clustering keys, then perform a range select on
 that column, right (assuming it's not a symptom of the antipattern of
 indefinitely overwriting the same value)?  I.E. you're deleting clusters
 off of a partition.  A tombstone isn't any more costly, and in some ways
 less costly than a normal column (it's a smaller size at rest than, say,
 inserting an empty string or other default value as someone suggested).

 Tombstones stay around a little longer post-compaction than other
 values, so that's a downside, but they also would drop off the record as if
 it had never been set on the next compaction after gc grace period.

 Tombstones aren't intrinsically bad, but they can have some bad
 properties in certain situations.  This doesn't strike me as one of them.
 If you have a way to avoid inserting null when you know you aren't
 occluding an underlying value, that would be ideal.  But because the
 tombstone would sit adjacent on disk to other values from the same insert,
 even if you were on platters, the drive head is *already positioned* over
 the tombstone location when it's read, because it read the prior value and
 subsequent value which were written during the same insert.

 In the end, inserting a tombstone into a non-clustered column shouldn't
 be appreciably worse (if it is at all) than inserting a value instead.  Or
 am I missing something here?

 On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson 
 matt.john...@algomi.com wrote:

 Thank you all for the advice!



 I have decided to use the Insert query builder (
 *com.datastax.driver.core.querybuilder.Insert*) which allows me to
 dynamically insert as many or as few columns as I need, and doesn’t require
 multiple prepared statements. Then, I will look at Ali’s suggestion – I
 will create a small helper method like ‘addToInsertIfNotNull’ and pump all
 my values into that, which will then filter out the ones that are null.
 Should keep the code nice and neat – I will feed back if I find any
 problems with this approach (but please jump in if you have already spotted
 any :)).



 Thanks!

 Matt



 *From:* Robert Wille [mailto:rwi...@fold3.com]
 *Sent:* 29 April 2015 15:16
 *To:* user@cassandra.apache.org
 *Subject:* Re: Inserting null values



 I’ve come across the same thing. I have a table with at least half a
 dozen columns that could be null, in any combination. Having a prepared
 statement for each permutation of null columns just isn’t

Re: Inserting null values

2015-04-29 Thread Robert Wille
I’ve come across the same thing. I have a table with at least half a dozen 
columns that could be null, in any combination. Having a prepared statement for 
each permutation of null columns just isn’t going to happen. I don’t want to 
build custom queries each time because I have a really cool system of managing 
my queries that relies on them being prepared.

Fortunately for me, I should have at most a handful of tombstones in each 
partition, and most of my records are written exactly once. So, I just let the 
tombstones get written and they’ll eventually get compacted out and life will 
go on.

It’s annoying and not ideal, but what can you do?

On Apr 29, 2015, at 2:36 AM, Matthew Johnson 
matt.john...@algomi.commailto:matt.john...@algomi.com wrote:

Hi all,

I have some fields that I am storing into Cassandra, but some of them could be 
null at any given point. As there are quite a lot of them, it makes the code 
much more readable if I don’t check each one for null before adding it to the 
INSERT.

I can see a few Jiras around CQL 3 supporting inserting nulls:

https://issues.apache.org/jira/browse/CASSANDRA-3783
https://issues.apache.org/jira/browse/CASSANDRA-5648

But I have tested inserting null and it seems to work fine (when querying the 
table with cqlsh, it shows up as a red lowercase null).

Are there any obvious pitfalls to look out for that I have missed? Could it be 
a performance concern to insert a row with some nulls, as opposed to checking 
the values first and inserting the row and just omitting those columns?

Thanks!
Matt



RE: Inserting null values

2015-04-29 Thread Matthew Johnson
Thank you all for the advice!



I have decided to use the Insert query builder (
*com.datastax.driver.core.querybuilder.Insert*) which allows me to
dynamically insert as many or as few columns as I need, and doesn’t require
multiple prepared statements. Then, I will look at Ali’s suggestion – I
will create a small helper method like ‘addToInsertIfNotNull’ and pump all
my values into that, which will then filter out the ones that are null.
Should keep the code nice and neat – I will feed back if I find any
problems with this approach (but please jump in if you have already spotted
any :)).



Thanks!

Matt



*From:* Robert Wille [mailto:rwi...@fold3.com]
*Sent:* 29 April 2015 15:16
*To:* user@cassandra.apache.org
*Subject:* Re: Inserting null values



I’ve come across the same thing. I have a table with at least half a dozen
columns that could be null, in any combination. Having a prepared statement
for each permutation of null columns just isn’t going to happen. I don’t
want to build custom queries each time because I have a really cool system
of managing my queries that relies on them being prepared.



Fortunately for me, I should have at most a handful of tombstones in each
partition, and most of my records are written exactly once. So, I just let
the tombstones get written and they’ll eventually get compacted out and
life will go on.



It’s annoying and not ideal, but what can you do?



On Apr 29, 2015, at 2:36 AM, Matthew Johnson matt.john...@algomi.com
wrote:



Hi all,



I have some fields that I am storing into Cassandra, but some of them could
be null at any given point. As there are quite a lot of them, it makes the
code much more readable if I don’t check each one for null before adding it
to the INSERT.



I can see a few Jiras around CQL 3 supporting inserting nulls:



https://issues.apache.org/jira/browse/CASSANDRA-3783

https://issues.apache.org/jira/browse/CASSANDRA-5648



But I have tested inserting null and it seems to work fine (when querying
the table with cqlsh, it shows up as a red lowercase *null*).



Are there any obvious pitfalls to look out for that I have missed? Could it
be a performance concern to insert a row with some nulls, as opposed to
checking the values first and inserting the row and just omitting those
columns?



Thanks!

Matt