Is there anything stopping you from using JSON instead of a collection?

On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote:

> If you aren't removing elements from the map, you should instead be able
> to use an UPDATE statement and append the map. It will have the same effect
> as overwriting it, because all the new keys will take precedence over the
> existing keys. But it'll happen without generating a tombstone first.
>
> If you do have to remove elements from the collection during this process,
> you are either facing tombstones or having to surgically figure out which
> elements ought to be removed (which also involves tombstones, though at
> least not range tombstones, so a bit cheaper).
>
> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> We are processing events in Spark and store the resulting entries
>> (containing a map) in Cassandra. The results can be new (no entry for this
>> key in Cassandra) or an Update (there is already an entry with this key in
>> Cassandra). We use the spark-cassandra-connector to store the data in
>> Cassandra.
>>
>> The connector will always do an insert of the data and will rely on the
>> upsert capabilities of cassandra. So every time an event is updated the
>> complete map is replaced with all the problems of tombstones.
>> Seems like we have to implement our own persist logic in which we check
>> if an element already exists and if yes update the map manually. that would
>> require a read before write which would be nasty. Another option would be
>> not to use a collection but (clustering) columns. Do you have another idea
>> of doing this?
>>
>> (the conclusion of this whole thing for me would be: use upsert, but do
>> specific updates on collections as an upsert might replace the whole
>> collection and generate thumbstones)
>>
>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>:
>>
>>> If you replace an entire collection, whether it's a map, set, or list, a
>>> range tombstone will be inserted followed by the new collection.  If you
>>> only update a single element, no tombstones are generated.
>>>
>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>> matthias.nieh...@codecentric.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> we have a table with a Map Field. We do not delete anything in this
>>>> table, but to updates on the values including the Map Field (most of the
>>>> time a new value for an existing key, Rarely adding new keys). We now
>>>> encounter a huge amount of thumbstones for this Table.
>>>>
>>>> We used sstable2json to take a look into the sstables:
>>>>
>>>>
>>>> {"key": "Betty_StoreCatalogLines:7",
>>>>
>>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>>
>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>>> 08:40Z",1463820040628001],
>>>>
>>>>            
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>>
>>>>            
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>>
>>>>            
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>>
>>>>            
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>>
>>>>            
>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>>
>>>> . . .
>>>>
>>>>   
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>>
>>>>            
>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001],
>>>>
>>>>            
>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>>> Code\":\"276\"}}",1463820040628001]
>>>>
>>>>
>>>>
>>>> Looking at the SStables it seem like every update of a value in a Map
>>>> breaks down to a delete and insert in the corresponding SSTable (see all
>>>> the thumbstone flags „t“ in the extract of sstable2json above).
>>>>
>>>> We are using Cassandra 2.2.5.
>>>>
>>>> Can you confirm this behavior?
>>>>
>>>> Thanks!
>>>> --
>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>>> 172.1702676
>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>> www.more4fi.de
>>>>
>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>> Schütz
>>>>
>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>> E-Mail ist nicht gestattet
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>>
>>
>> --
>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>> 172.1702676
>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>> www.more4fi.de
>>
>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>> Schütz
>>
>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>> E-Mail ist nicht gestattet
>>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

Reply via email to