Re: question about how write_header works

Robert Dionne Thu, 23 Sep 2010 09:40:18 -0700

On Sep 23, 2010, at 12:25 PM, Robert Newson wrote:

> The idea also doesn't account for the waste in obsolete b+tree nodes.
> Basically, it's more complicated than that.
> 
> Compaction is unavoidable with an append-only strategy. One idea I've
> pitched (and frankly stolen from Berkeley JE) is for the database file
> to be a series of files instead of a single one.

Bitcask takes a somewhat similar approach to JE in the use of multiple files




> If we track the used
> space in each file, we can compact any file that drops below a
> threshold (by copying the extant data to the new tail and deleting the
> old file). This is still compaction but it's no longer a wholesale
> rewrite of the database.
> 
> All that said, with enough databases and some scheduling, the current
> scheme is still pretty good.
> 
> B.
> 
> On Thu, Sep 23, 2010 at 5:11 PM, Paul Davis <[email protected]> 
> wrote:
>> On Thu, Sep 23, 2010 at 12:00 PM, chongqing xiao <[email protected]> wrote:
>>> Hi, Paul:
>>> 
>>> Thanks for the clarification.
>>> 
>>> I am not sure why this is designed this way but here is one approach I
>>> think might work better
>>> 
>>> Instead of appending the header to the data file, why not just moving
>>> the header to a different file. The header file can be implmented as
>>> before - 2 duplicate header blocks to keep it
>>> corruption free. For performance reason, the header file can be cached
>>> (say using memory mapped file).
>>> 
>>> The reason I like this approache better is that for the application I
>>> am interested in - archiving data from relational database, the saved
>>> data never change. So if there is no wasted space for the old header,
>>> there is no need to compact the database file.
>>> 
>>> Chong
>>> 
>> 
>> Writing the header to the data file means that the header is where the
>> data is. Ie, if the header is there and intact, we can be reasonably
>> sure that the data the header refers to is also there (barring weirdo
>> filesystems like xfs). Using a second file descriptor per database is
>> an increase of 100% in the number of file descriptors. This would very
>> much affect people that have lots of active databases on a single
>> node. I'm sure there are other reasons but I've not had anything to
>> eat yet.
>> 
>> Paul
>> 
>> 
>> 
>>> On Thu, Sep 23, 2010 at 8:44 AM, Paul Davis <[email protected]> 
>>> wrote:
>>>> Its not appended each time data is written necessarily. There are
>>>> optimizations to batch as many writes to the database together as
>>>> possible as well as delayed commits which will write the header out
>>>> every N seconds.
>>>> 
>>>> Remember that *any* write to the database is going to look like wasted
>>>> space. Even document deletes make the database file grow larger.
>>>> 
>>>> When a header is written, it contains checksums of its contents and
>>>> when reading we check that nothing has changed. There's an fsync
>>>> before and after writing the header which also help to ensure that
>>>> writes succeed.
>>>> 
>>>> As to the header2 or header1 problem, if header2 appears to be
>>>> corrupted or is otherwise discarded, the header search just continues
>>>> through the file looking for the next valid header. In this case that
>>>> would mean that newData2 would not be considered valid data and
>>>> ignored.
>>>> 
>>>> HTH,
>>>> Paul Davis
>>>> 
>>>> On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao <[email protected]> wrote:
>>>>> Hi, Adam:
>>>>> 
>>>>> Thanks for the answer.
>>>>> 
>>>>> If that is how it works, that seems create a lot of wasted space
>>>>> assuming a new header has to be appended each time new data is saved.
>>>>> 
>>>>> Also, assuming here is the data layout
>>>>> 
>>>>> newData1   ->start
>>>>> header1
>>>>> newData2
>>>>> header2      -> end
>>>>> 
>>>>> If header 2 is partially written, I am assuming newData will also be
>>>>> discarded. If that is the case, I am assuming there is a special flag
>>>>> in header 1 so the code can skip newData2 and find header1?
>>>>> 
>>>>> I am very interested in couchdb and I think it might be a very good
>>>>> choice for archiving relational data with some minor changes.
>>>>> 
>>>>> Thanks
>>>>> Chong
>>>>> 
>>>>> On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski <[email protected]> 
>>>>> wrote:
>>>>>> Hi Chong, that's exactly right.  Regards,
>>>>>> 
>>>>>> Adam
>>>>>> 
>>>>>> On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Could anyone explain how write_header (or header) in works in couchdb?
>>>>>>> 
>>>>>>> When appending new header, I am assuming the new header will be
>>>>>>> appended to the end of the DB file and the old header will be kept
>>>>>>> around?
>>>>>>> 
>>>>>>> If that is the case, what will happen if the header is partially
>>>>>>> written? I am assuming the code will loop back and find the previous
>>>>>>> old header and recover from there?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Chong
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>
Re: question about how write_header works

Reply via email to