You're right, there is no data in tombstone, only a column name. So
there is only small overhead of disk size after delete. But i must
agree with post above, it's pointless in deleting prior to inserting.
Moreover, it needs one op more to compute resulting row.
cheers,
Olek

2014-09-10 22:18 GMT+02:00 graham sanderson <gra...@vast.com>:
> delete inserts a tombstone which is likely smaller than the original record 
> (though still (currently) has overhead of cost for full key/column name
> the data for the insert after a delete would be identical to the data if you 
> just inserted/updated
>
> no real benefit I can think of for doing the delete first.
>
> On Sep 10, 2014, at 2:25 PM, olek.stas...@gmail.com wrote:
>
>> I think so.
>> this is how i see it:
>> on the very beginning you have such line in datafile:
>> {key: [col_name, col_value, date_of_last_change]} //something similar,
>> i don't remember now
>>
>> after delete you're adding line:
>> {key:[col_name, last_col_value, date_of_delete, 'd']} //this d
>> indicates that field is deleted
>> after insert the following line is added:
>> {key: [col_name, col_value, date_of_insert]}
>> so delete and then insert generates 2 lines in datafile.
>>
>> after pure insert (upsert in fact) you will have only one line
>> {key: [col_name, col_value, date_of_insert]}
>> So, summarizing, in second scenario you have only one line, in first: two.
>> I hope my post is correct ;)
>> regards,
>> Olek
>>
>> 2014-09-10 18:56 GMT+02:00 Michal Budzyn <michalbud...@gmail.com>:
>>> Would the factor before compaction be always 2 ?
>>>
>>> On Wed, Sep 10, 2014 at 6:38 PM, olek.stas...@gmail.com
>>> <olek.stas...@gmail.com> wrote:
>>>>
>>>> IMHO, delete then insert will take two times more disk space then
>>>> single insert. But after compaction the difference will disappear.
>>>> This was true in version prior to 2.0, but it should still work this
>>>> way. But maybe someone will correct me, if i'm wrong.
>>>> Cheers,
>>>> Olek
>>>>
>>>> 2014-09-10 18:30 GMT+02:00 Michal Budzyn <michalbud...@gmail.com>:
>>>>> One insert would be much better e.g. for performance and network
>>>>> latency.
>>>>> I wanted to know if there is a significant difference (apart from
>>>>> additional
>>>>> commit log entry) in the used storage between these 2 use cases.
>>>>>
>>>
>>>
>

Reply via email to