Is there anything stopping you from using JSON instead of a collection? On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote:
> If you aren't removing elements from the map, you should instead be able > to use an UPDATE statement and append the map. It will have the same effect > as overwriting it, because all the new keys will take precedence over the > existing keys. But it'll happen without generating a tombstone first. > > If you do have to remove elements from the collection during this process, > you are either facing tombstones or having to surgically figure out which > elements ought to be removed (which also involves tombstones, though at > least not range tombstones, so a bit cheaper). > > On Fri, May 27, 2016, 5:39 AM Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> We are processing events in Spark and store the resulting entries >> (containing a map) in Cassandra. The results can be new (no entry for this >> key in Cassandra) or an Update (there is already an entry with this key in >> Cassandra). We use the spark-cassandra-connector to store the data in >> Cassandra. >> >> The connector will always do an insert of the data and will rely on the >> upsert capabilities of cassandra. So every time an event is updated the >> complete map is replaced with all the problems of tombstones. >> Seems like we have to implement our own persist logic in which we check >> if an element already exists and if yes update the map manually. that would >> require a read before write which would be nasty. Another option would be >> not to use a collection but (clustering) columns. Do you have another idea >> of doing this? >> >> (the conclusion of this whole thing for me would be: use upsert, but do >> specific updates on collections as an upsert might replace the whole >> collection and generate thumbstones) >> >> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>: >> >>> If you replace an entire collection, whether it's a map, set, or list, a >>> range tombstone will be inserted followed by the new collection. If you >>> only update a single element, no tombstones are generated. >>> >>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < >>> matthias.nieh...@codecentric.de> wrote: >>> >>>> Hi, >>>> >>>> we have a table with a Map Field. We do not delete anything in this >>>> table, but to updates on the values including the Map Field (most of the >>>> time a new value for an existing key, Rarely adding new keys). We now >>>> encounter a huge amount of thumbstones for this Table. >>>> >>>> We used sstable2json to take a look into the sstables: >>>> >>>> >>>> {"key": "Betty_StoreCatalogLines:7", >>>> >>>> "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 >>>> 08:40Z",1463820040628001], >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], >>>> >>>> >>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], >>>> >>>> . . . >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], >>>> >>>> >>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001], >>>> >>>> >>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article >>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article >>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country >>>> Code\":\"276\"}}",1463820040628001] >>>> >>>> >>>> >>>> Looking at the SStables it seem like every update of a value in a Map >>>> breaks down to a delete and insert in the corresponding SSTable (see all >>>> the thumbstone flags „t“ in the extract of sstable2json above). >>>> >>>> We are using Cassandra 2.2.5. >>>> >>>> Can you confirm this behavior? >>>> >>>> Thanks! >>>> -- >>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) >>>> 172.1702676 >>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>> www.more4fi.de >>>> >>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>> Schütz >>>> >>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>> E-Mail ist nicht gestattet >>>> >>> >>> >>> >>> -- >>> Tyler Hobbs >>> DataStax <http://datastax.com/> >>> >> >> >> >> -- >> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) >> 172.1702676 >> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >> www.more4fi.de >> >> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >> Schütz >> >> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >> E-Mail ist nicht gestattet >> > -- Kurt Greaves k...@instaclustr.com www.instaclustr.com