Re: Internal Handling of Map Updates

2016-06-02 Thread Eric Stevens
If it's overwrites and append only with no removes, an UPDATE will let you
do that to standard collections. Like INSERT, UPDATE acts like an UPSERT.

On Thu, Jun 2, 2016, 12:52 AM Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> JSON would be an option, yes. A frozen collection would not work for us,
> as the updates are both overwrites of existing values and appends of new
> values (but never a remove of values).
> So we end up with 3 options:
>
> 1. use clustering columns
> 2. use json
> 3. save the row not using the spark-cassandra-connectors saveToCassandra()
> method (which does an insert of the whole row and map), but writing an own
> save method using update on the map (as Eric proposed).
>
> I think we will go for option 1 or 2 as those are the least costly
> solutions.
>
> Nevertheless, its a pity that an insert on a row with a map will always
> create tombstones :-(
>
>
>
> 2016-06-02 2:02 GMT+02:00 Eric Stevens :
>
>> From that perspective, you could also use a frozen collection which takes
>> away the ability to append, but for which overwrites shouldn't generate a
>> tombstone.
>>
>> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves  wrote:
>>
>>> Is there anything stopping you from using JSON instead of a collection?
>>>
>>> On 27 May 2016 at 15:20, Eric Stevens  wrote:
>>>
 If you aren't removing elements from the map, you should instead be
 able to use an UPDATE statement and append the map. It will have the same
 effect as overwriting it, because all the new keys will take precedence
 over the existing keys. But it'll happen without generating a tombstone
 first.

 If you do have to remove elements from the collection during this
 process, you are either facing tombstones or having to surgically figure
 out which elements ought to be removed (which also involves tombstones,
 though at least not range tombstones, so a bit cheaper).

 On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
 matthias.nieh...@codecentric.de> wrote:

> We are processing events in Spark and store the resulting entries
> (containing a map) in Cassandra. The results can be new (no entry for this
> key in Cassandra) or an Update (there is already an entry with this key in
> Cassandra). We use the spark-cassandra-connector to store the data in
> Cassandra.
>
> The connector will always do an insert of the data and will rely on
> the upsert capabilities of cassandra. So every time an event is updated 
> the
> complete map is replaced with all the problems of tombstones.
> Seems like we have to implement our own persist logic in which we
> check if an element already exists and if yes update the map manually. 
> that
> would require a read before write which would be nasty. Another option
> would be not to use a collection but (clustering) columns. Do you have
> another idea of doing this?
>
> (the conclusion of this whole thing for me would be: use upsert, but
> do specific updates on collections as an upsert might replace the whole
> collection and generate thumbstones)
>
> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :
>
>> If you replace an entire collection, whether it's a map, set, or
>> list, a range tombstone will be inserted followed by the new collection.
>> If you only update a single element, no tombstones are generated.
>>
>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> Hi,
>>>
>>> we have a table with a Map Field. We do not delete anything in this
>>> table, but to updates on the values including the Map Field (most of the
>>> time a new value for an existing key, Rarely adding new keys). We now
>>> encounter a huge amount of thumbstones for this Table.
>>>
>>> We used sstable2json to take a look into the sstables:
>>>
>>>
>>> {"key": "Betty_StoreCatalogLines:7",
>>>
>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>> 08:40Z",1463820040628001],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>
>>>
>>> 

Re: Internal Handling of Map Updates

2016-06-02 Thread Matthias Niehoff
JSON would be an option, yes. A frozen collection would not work for us, as
the updates are both overwrites of existing values and appends of new
values (but never a remove of values).
So we end up with 3 options:

1. use clustering columns
2. use json
3. save the row not using the spark-cassandra-connectors saveToCassandra()
method (which does an insert of the whole row and map), but writing an own
save method using update on the map (as Eric proposed).

I think we will go for option 1 or 2 as those are the least costly
solutions.

Nevertheless, its a pity that an insert on a row with a map will always
create tombstones :-(



2016-06-02 2:02 GMT+02:00 Eric Stevens :

> From that perspective, you could also use a frozen collection which takes
> away the ability to append, but for which overwrites shouldn't generate a
> tombstone.
>
> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves  wrote:
>
>> Is there anything stopping you from using JSON instead of a collection?
>>
>> On 27 May 2016 at 15:20, Eric Stevens  wrote:
>>
>>> If you aren't removing elements from the map, you should instead be able
>>> to use an UPDATE statement and append the map. It will have the same effect
>>> as overwriting it, because all the new keys will take precedence over the
>>> existing keys. But it'll happen without generating a tombstone first.
>>>
>>> If you do have to remove elements from the collection during this
>>> process, you are either facing tombstones or having to surgically figure
>>> out which elements ought to be removed (which also involves tombstones,
>>> though at least not range tombstones, so a bit cheaper).
>>>
>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>>> matthias.nieh...@codecentric.de> wrote:
>>>
 We are processing events in Spark and store the resulting entries
 (containing a map) in Cassandra. The results can be new (no entry for this
 key in Cassandra) or an Update (there is already an entry with this key in
 Cassandra). We use the spark-cassandra-connector to store the data in
 Cassandra.

 The connector will always do an insert of the data and will rely on the
 upsert capabilities of cassandra. So every time an event is updated the
 complete map is replaced with all the problems of tombstones.
 Seems like we have to implement our own persist logic in which we check
 if an element already exists and if yes update the map manually. that would
 require a read before write which would be nasty. Another option would be
 not to use a collection but (clustering) columns. Do you have another idea
 of doing this?

 (the conclusion of this whole thing for me would be: use upsert, but do
 specific updates on collections as an upsert might replace the whole
 collection and generate thumbstones)

 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :

> If you replace an entire collection, whether it's a map, set, or list,
> a range tombstone will be inserted followed by the new collection.  If you
> only update a single element, no tombstones are generated.
>
> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi,
>>
>> we have a table with a Map Field. We do not delete anything in this
>> table, but to updates on the values including the Map Field (most of the
>> time a new value for an existing key, Rarely adding new keys). We now
>> encounter a huge amount of thumbstones for this Table.
>>
>> We used sstable2json to take a look into the sstables:
>>
>>
>> {"key": "Betty_StoreCatalogLines:7",
>>
>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>
>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>> 08:40Z",1463820040628001],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>
>> . . .
>>
>>   
>> 

Re: Internal Handling of Map Updates

2016-06-01 Thread Eric Stevens
>From that perspective, you could also use a frozen collection which takes
away the ability to append, but for which overwrites shouldn't generate a
tombstone.

On Wed, Jun 1, 2016, 5:54 PM kurt Greaves  wrote:

> Is there anything stopping you from using JSON instead of a collection?
>
> On 27 May 2016 at 15:20, Eric Stevens  wrote:
>
>> If you aren't removing elements from the map, you should instead be able
>> to use an UPDATE statement and append the map. It will have the same effect
>> as overwriting it, because all the new keys will take precedence over the
>> existing keys. But it'll happen without generating a tombstone first.
>>
>> If you do have to remove elements from the collection during this
>> process, you are either facing tombstones or having to surgically figure
>> out which elements ought to be removed (which also involves tombstones,
>> though at least not range tombstones, so a bit cheaper).
>>
>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> We are processing events in Spark and store the resulting entries
>>> (containing a map) in Cassandra. The results can be new (no entry for this
>>> key in Cassandra) or an Update (there is already an entry with this key in
>>> Cassandra). We use the spark-cassandra-connector to store the data in
>>> Cassandra.
>>>
>>> The connector will always do an insert of the data and will rely on the
>>> upsert capabilities of cassandra. So every time an event is updated the
>>> complete map is replaced with all the problems of tombstones.
>>> Seems like we have to implement our own persist logic in which we check
>>> if an element already exists and if yes update the map manually. that would
>>> require a read before write which would be nasty. Another option would be
>>> not to use a collection but (clustering) columns. Do you have another idea
>>> of doing this?
>>>
>>> (the conclusion of this whole thing for me would be: use upsert, but do
>>> specific updates on collections as an upsert might replace the whole
>>> collection and generate thumbstones)
>>>
>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :
>>>
 If you replace an entire collection, whether it's a map, set, or list,
 a range tombstone will be inserted followed by the new collection.  If you
 only update a single element, no tombstones are generated.

 On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
 matthias.nieh...@codecentric.de> wrote:

> Hi,
>
> we have a table with a Map Field. We do not delete anything in this
> table, but to updates on the values including the Map Field (most of the
> time a new value for an existing key, Rarely adding new keys). We now
> encounter a huge amount of thumbstones for this Table.
>
> We used sstable2json to take a look into the sstables:
>
>
> {"key": "Betty_StoreCatalogLines:7",
>
>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>
>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
> 08:40Z",1463820040628001],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>
>
> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>
> . . .
>
>   
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>
>
> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
> Code\":\"276\"}}",1463820040628001]
>
>
>
> Looking at the SStables it seem like every update of a value in a Map
> breaks down to a delete and insert in the corresponding SSTable (see all
> the thumbstone flags „t“ in the extract of sstable2json 

Re: Internal Handling of Map Updates

2016-06-01 Thread kurt Greaves
Is there anything stopping you from using JSON instead of a collection?

On 27 May 2016 at 15:20, Eric Stevens  wrote:

> If you aren't removing elements from the map, you should instead be able
> to use an UPDATE statement and append the map. It will have the same effect
> as overwriting it, because all the new keys will take precedence over the
> existing keys. But it'll happen without generating a tombstone first.
>
> If you do have to remove elements from the collection during this process,
> you are either facing tombstones or having to surgically figure out which
> elements ought to be removed (which also involves tombstones, though at
> least not range tombstones, so a bit cheaper).
>
> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> We are processing events in Spark and store the resulting entries
>> (containing a map) in Cassandra. The results can be new (no entry for this
>> key in Cassandra) or an Update (there is already an entry with this key in
>> Cassandra). We use the spark-cassandra-connector to store the data in
>> Cassandra.
>>
>> The connector will always do an insert of the data and will rely on the
>> upsert capabilities of cassandra. So every time an event is updated the
>> complete map is replaced with all the problems of tombstones.
>> Seems like we have to implement our own persist logic in which we check
>> if an element already exists and if yes update the map manually. that would
>> require a read before write which would be nasty. Another option would be
>> not to use a collection but (clustering) columns. Do you have another idea
>> of doing this?
>>
>> (the conclusion of this whole thing for me would be: use upsert, but do
>> specific updates on collections as an upsert might replace the whole
>> collection and generate thumbstones)
>>
>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :
>>
>>> If you replace an entire collection, whether it's a map, set, or list, a
>>> range tombstone will be inserted followed by the new collection.  If you
>>> only update a single element, no tombstones are generated.
>>>
>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>> matthias.nieh...@codecentric.de> wrote:
>>>
 Hi,

 we have a table with a Map Field. We do not delete anything in this
 table, but to updates on the values including the Map Field (most of the
 time a new value for an existing key, Rarely adding new keys). We now
 encounter a huge amount of thumbstones for this Table.

 We used sstable2json to take a look into the sstables:


 {"key": "Betty_StoreCatalogLines:7",

  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],

["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
 08:40Z",1463820040628001],


 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],


 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],


 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],


 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],


 [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],

 . . .

   
 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],


 ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],


 [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
 Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
 #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
 Code\":\"276\"}}",1463820040628001]



 Looking at the SStables it seem like every update of a value in a Map
 breaks down to a delete and insert in the corresponding SSTable (see all
 the thumbstone flags „t“ in the extract of sstable2json above).

 We are using Cassandra 2.2.5.

 Can you confirm this behavior?

 Thanks!
 --
 Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
 codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
 tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)

Re: Internal Handling of Map Updates

2016-05-27 Thread Eric Stevens
If you aren't removing elements from the map, you should instead be able to
use an UPDATE statement and append the map. It will have the same effect as
overwriting it, because all the new keys will take precedence over the
existing keys. But it'll happen without generating a tombstone first.

If you do have to remove elements from the collection during this process,
you are either facing tombstones or having to surgically figure out which
elements ought to be removed (which also involves tombstones, though at
least not range tombstones, so a bit cheaper).

On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> We are processing events in Spark and store the resulting entries
> (containing a map) in Cassandra. The results can be new (no entry for this
> key in Cassandra) or an Update (there is already an entry with this key in
> Cassandra). We use the spark-cassandra-connector to store the data in
> Cassandra.
>
> The connector will always do an insert of the data and will rely on the
> upsert capabilities of cassandra. So every time an event is updated the
> complete map is replaced with all the problems of tombstones.
> Seems like we have to implement our own persist logic in which we check if
> an element already exists and if yes update the map manually. that would
> require a read before write which would be nasty. Another option would be
> not to use a collection but (clustering) columns. Do you have another idea
> of doing this?
>
> (the conclusion of this whole thing for me would be: use upsert, but do
> specific updates on collections as an upsert might replace the whole
> collection and generate thumbstones)
>
> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :
>
>> If you replace an entire collection, whether it's a map, set, or list, a
>> range tombstone will be inserted followed by the new collection.  If you
>> only update a single element, no tombstones are generated.
>>
>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> Hi,
>>>
>>> we have a table with a Map Field. We do not delete anything in this
>>> table, but to updates on the values including the Map Field (most of the
>>> time a new value for an existing key, Rarely adding new keys). We now
>>> encounter a huge amount of thumbstones for this Table.
>>>
>>> We used sstable2json to take a look into the sstables:
>>>
>>>
>>> {"key": "Betty_StoreCatalogLines:7",
>>>
>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>
>>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>> 08:40Z",1463820040628001],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>
>>>
>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>
>>> . . .
>>>
>>>   
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>>>
>>>
>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>> Code\":\"276\"}}",1463820040628001]
>>>
>>>
>>>
>>> Looking at the SStables it seem like every update of a value in a Map
>>> breaks down to a delete and insert in the corresponding SSTable (see all
>>> the thumbstone flags „t“ in the extract of sstable2json above).
>>>
>>> We are using Cassandra 2.2.5.
>>>
>>> Can you confirm this behavior?
>>>
>>> Thanks!
>>> --
>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>> 172.1702676
>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>> www.more4fi.de
>>>
>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>> 

Re: Internal Handling of Map Updates

2016-05-27 Thread Matthias Niehoff
We are processing events in Spark and store the resulting entries
(containing a map) in Cassandra. The results can be new (no entry for this
key in Cassandra) or an Update (there is already an entry with this key in
Cassandra). We use the spark-cassandra-connector to store the data in
Cassandra.

The connector will always do an insert of the data and will rely on the
upsert capabilities of cassandra. So every time an event is updated the
complete map is replaced with all the problems of tombstones.
Seems like we have to implement our own persist logic in which we check if
an element already exists and if yes update the map manually. that would
require a read before write which would be nasty. Another option would be
not to use a collection but (clustering) columns. Do you have another idea
of doing this?

(the conclusion of this whole thing for me would be: use upsert, but do
specific updates on collections as an upsert might replace the whole
collection and generate thumbstones)

2016-05-25 17:37 GMT+02:00 Tyler Hobbs :

> If you replace an entire collection, whether it's a map, set, or list, a
> range tombstone will be inserted followed by the new collection.  If you
> only update a single element, no tombstones are generated.
>
> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi,
>>
>> we have a table with a Map Field. We do not delete anything in this
>> table, but to updates on the values including the Map Field (most of the
>> time a new value for an existing key, Rarely adding new keys). We now
>> encounter a huge amount of thumbstones for this Table.
>>
>> We used sstable2json to take a look into the sstables:
>>
>>
>> {"key": "Betty_StoreCatalogLines:7",
>>
>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>
>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>> 08:40Z",1463820040628001],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>
>> . . .
>>
>>   
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>> Code\":\"276\"}}",1463820040628001]
>>
>>
>>
>> Looking at the SStables it seem like every update of a value in a Map
>> breaks down to a delete and insert in the corresponding SSTable (see all
>> the thumbstone flags „t“ in the extract of sstable2json above).
>>
>> We are using Cassandra 2.2.5.
>>
>> Can you confirm this behavior?
>>
>> Thanks!
>> --
>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>> 172.1702676
>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>> www.more4fi.de
>>
>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>> Schütz
>>
>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>> E-Mail ist nicht gestattet
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>



-- 
Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: 

Re: Internal Handling of Map Updates

2016-05-25 Thread kurt Greaves
Literally just encountered this exact same thing. I couldn't find anything
in the official docs related to this but there is at least this blog that
explains it:
http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html
and this entry in ScyllaDB's documentation:
http://www.scylladb.com/kb/sstable-interpretation/
Can confirm what Tyler mentioned, updating a single element does not cause
a tombstone.

On 25 May 2016 at 15:37, Tyler Hobbs  wrote:

> If you replace an entire collection, whether it's a map, set, or list, a
> range tombstone will be inserted followed by the new collection.  If you
> only update a single element, no tombstones are generated.
>
> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi,
>>
>> we have a table with a Map Field. We do not delete anything in this
>> table, but to updates on the values including the Map Field (most of the
>> time a new value for an existing key, Rarely adding new keys). We now
>> encounter a huge amount of thumbstones for this Table.
>>
>> We used sstable2json to take a look into the sstables:
>>
>>
>> {"key": "Betty_StoreCatalogLines:7",
>>
>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>
>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>> 08:40Z",1463820040628001],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>
>> . . .
>>
>>   
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>> Code\":\"276\"}}",1463820040628001]
>>
>>
>>
>> Looking at the SStables it seem like every update of a value in a Map
>> breaks down to a delete and insert in the corresponding SSTable (see all
>> the thumbstone flags „t“ in the extract of sstable2json above).
>>
>> We are using Cassandra 2.2.5.
>>
>> Can you confirm this behavior?
>>
>> Thanks!
>> --
>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>> 172.1702676
>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>> www.more4fi.de
>>
>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>> Schütz
>>
>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>> E-Mail ist nicht gestattet
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Internal Handling of Map Updates

2016-05-25 Thread Tyler Hobbs
If you replace an entire collection, whether it's a map, set, or list, a
range tombstone will be inserted followed by the new collection.  If you
only update a single element, no tombstones are generated.

On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> Hi,
>
> we have a table with a Map Field. We do not delete anything in this table,
> but to updates on the values including the Map Field (most of the time a
> new value for an existing key, Rarely adding new keys). We now encounter a
> huge amount of thumbstones for this Table.
>
> We used sstable2json to take a look into the sstables:
>
>
> {"key": "Betty_StoreCatalogLines:7",
>
>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>
>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
> 08:40Z",1463820040628001],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>
>
> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>
> . . .
>
>   
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>
>
> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>
>
> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
> Code\":\"276\"}}",1463820040628001]
>
>
>
> Looking at the SStables it seem like every update of a value in a Map
> breaks down to a delete and insert in the corresponding SSTable (see all
> the thumbstone flags „t“ in the extract of sstable2json above).
>
> We are using Cassandra 2.2.5.
>
> Can you confirm this behavior?
>
> Thanks!
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>



-- 
Tyler Hobbs
DataStax