You're right, there is no data in tombstone, only a column name. So there is only small overhead of disk size after delete. But i must agree with post above, it's pointless in deleting prior to inserting. Moreover, it needs one op more to compute resulting row. cheers, Olek
2014-09-10 22:18 GMT+02:00 graham sanderson <gra...@vast.com>: > delete inserts a tombstone which is likely smaller than the original record > (though still (currently) has overhead of cost for full key/column name > the data for the insert after a delete would be identical to the data if you > just inserted/updated > > no real benefit I can think of for doing the delete first. > > On Sep 10, 2014, at 2:25 PM, olek.stas...@gmail.com wrote: > >> I think so. >> this is how i see it: >> on the very beginning you have such line in datafile: >> {key: [col_name, col_value, date_of_last_change]} //something similar, >> i don't remember now >> >> after delete you're adding line: >> {key:[col_name, last_col_value, date_of_delete, 'd']} //this d >> indicates that field is deleted >> after insert the following line is added: >> {key: [col_name, col_value, date_of_insert]} >> so delete and then insert generates 2 lines in datafile. >> >> after pure insert (upsert in fact) you will have only one line >> {key: [col_name, col_value, date_of_insert]} >> So, summarizing, in second scenario you have only one line, in first: two. >> I hope my post is correct ;) >> regards, >> Olek >> >> 2014-09-10 18:56 GMT+02:00 Michal Budzyn <michalbud...@gmail.com>: >>> Would the factor before compaction be always 2 ? >>> >>> On Wed, Sep 10, 2014 at 6:38 PM, olek.stas...@gmail.com >>> <olek.stas...@gmail.com> wrote: >>>> >>>> IMHO, delete then insert will take two times more disk space then >>>> single insert. But after compaction the difference will disappear. >>>> This was true in version prior to 2.0, but it should still work this >>>> way. But maybe someone will correct me, if i'm wrong. >>>> Cheers, >>>> Olek >>>> >>>> 2014-09-10 18:30 GMT+02:00 Michal Budzyn <michalbud...@gmail.com>: >>>>> One insert would be much better e.g. for performance and network >>>>> latency. >>>>> I wanted to know if there is a significant difference (apart from >>>>> additional >>>>> commit log entry) in the used storage between these 2 use cases. >>>>> >>> >>> >