On Sep 23, 2010, at 12:25 PM, Robert Newson wrote:
> The idea also doesn't account for the waste in obsolete b+tree nodes.
> Basically, it's more complicated than that.
>
> Compaction is unavoidable with an append-only strategy. One idea I've
> pitched (and frankly stolen from Berkeley JE) is for the database file
> to be a series of files instead of a single one.
Bitcask takes a somewhat similar approach to JE in the use of multiple files
> If we track the used
> space in each file, we can compact any file that drops below a
> threshold (by copying the extant data to the new tail and deleting the
> old file). This is still compaction but it's no longer a wholesale
> rewrite of the database.
>
> All that said, with enough databases and some scheduling, the current
> scheme is still pretty good.
>
> B.
>
> On Thu, Sep 23, 2010 at 5:11 PM, Paul Davis <[email protected]>
> wrote:
>> On Thu, Sep 23, 2010 at 12:00 PM, chongqing xiao <[email protected]> wrote:
>>> Hi, Paul:
>>>
>>> Thanks for the clarification.
>>>
>>> I am not sure why this is designed this way but here is one approach I
>>> think might work better
>>>
>>> Instead of appending the header to the data file, why not just moving
>>> the header to a different file. The header file can be implmented as
>>> before - 2 duplicate header blocks to keep it
>>> corruption free. For performance reason, the header file can be cached
>>> (say using memory mapped file).
>>>
>>> The reason I like this approache better is that for the application I
>>> am interested in - archiving data from relational database, the saved
>>> data never change. So if there is no wasted space for the old header,
>>> there is no need to compact the database file.
>>>
>>> Chong
>>>
>>
>> Writing the header to the data file means that the header is where the
>> data is. Ie, if the header is there and intact, we can be reasonably
>> sure that the data the header refers to is also there (barring weirdo
>> filesystems like xfs). Using a second file descriptor per database is
>> an increase of 100% in the number of file descriptors. This would very
>> much affect people that have lots of active databases on a single
>> node. I'm sure there are other reasons but I've not had anything to
>> eat yet.
>>
>> Paul
>>
>>
>>
>>> On Thu, Sep 23, 2010 at 8:44 AM, Paul Davis <[email protected]>
>>> wrote:
>>>> Its not appended each time data is written necessarily. There are
>>>> optimizations to batch as many writes to the database together as
>>>> possible as well as delayed commits which will write the header out
>>>> every N seconds.
>>>>
>>>> Remember that *any* write to the database is going to look like wasted
>>>> space. Even document deletes make the database file grow larger.
>>>>
>>>> When a header is written, it contains checksums of its contents and
>>>> when reading we check that nothing has changed. There's an fsync
>>>> before and after writing the header which also help to ensure that
>>>> writes succeed.
>>>>
>>>> As to the header2 or header1 problem, if header2 appears to be
>>>> corrupted or is otherwise discarded, the header search just continues
>>>> through the file looking for the next valid header. In this case that
>>>> would mean that newData2 would not be considered valid data and
>>>> ignored.
>>>>
>>>> HTH,
>>>> Paul Davis
>>>>
>>>> On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao <[email protected]> wrote:
>>>>> Hi, Adam:
>>>>>
>>>>> Thanks for the answer.
>>>>>
>>>>> If that is how it works, that seems create a lot of wasted space
>>>>> assuming a new header has to be appended each time new data is saved.
>>>>>
>>>>> Also, assuming here is the data layout
>>>>>
>>>>> newData1 ->start
>>>>> header1
>>>>> newData2
>>>>> header2 -> end
>>>>>
>>>>> If header 2 is partially written, I am assuming newData will also be
>>>>> discarded. If that is the case, I am assuming there is a special flag
>>>>> in header 1 so the code can skip newData2 and find header1?
>>>>>
>>>>> I am very interested in couchdb and I think it might be a very good
>>>>> choice for archiving relational data with some minor changes.
>>>>>
>>>>> Thanks
>>>>> Chong
>>>>>
>>>>> On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski <[email protected]>
>>>>> wrote:
>>>>>> Hi Chong, that's exactly right. Regards,
>>>>>>
>>>>>> Adam
>>>>>>
>>>>>> On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could anyone explain how write_header (or header) in works in couchdb?
>>>>>>>
>>>>>>> When appending new header, I am assuming the new header will be
>>>>>>> appended to the end of the DB file and the old header will be kept
>>>>>>> around?
>>>>>>>
>>>>>>> If that is the case, what will happen if the header is partially
>>>>>>> written? I am assuming the code will loop back and find the previous
>>>>>>> old header and recover from there?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Chong
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>