>>> I really don't know well how Firebird does the page precedence graph,
>>> but if we could write all "independent" (say, the ones with the order
>>> they're write didn't matter) then call fdatasync and continues, would be
>>> much better than O_DSYNC mode.
>>>
>>> Is it possible? How generally is that precedence graph, does it have to
>>> many "independent" pages on it?
>>
>> There is 2 general places when Firebird writes pages. All of them, of
>> course,
>> used precedence graph. See below:
>>
>> a) flush on commit\rollback\detach\background_gc :
>>
>> Pages (bdb's) to be flushed are collected into array, array is sorted by
>> page numbers
>> and then pages which have no dependent pages are written (in page numbers
>> order).
>> Of course, after page write, its dependency is cleared, so after first write
>> pass we have
>> another set of "independent" pages. This process continues until all pages
>> from array is
>> written.
>>
>> b) "single" page write when old dirty page is eliminated from the page cache
>>
>> Page cache choosed oldest dirty page and attempt to write it. But if
>> there is another
>> dirty pages which is dependent on "victim" page - they will be written
>> before "victim" page.
>>
>> This is recursive process as currently to be written page could have
>> another dependent
>> pages which must be written first.
>>
>> Same "single" write occured also when :
>> - page lock is downgraded and page is dirty (classic only)
>> - new precedence relationship will make a circle in precedence graph
>> - dirty page buffer is marked as system (or must_write) and page is released
>> - cache_writer thread (SS only) writes an oldest dirty page
>>
>> You see - in case (a) we have more or less group writes while in case
>> (b) we have mostly
>> single write (but in some unfortunate cases it could affect many pages).
>> Case (a) is optimized
>> and we can call fdatasync() after each pass of writes of independent pages.
>> In case (b) we
>> have no such possibility, at least in current code.
>>
>
> So I suppose if fdatasync does not interfere when used in multiple
> threads, it may not make case (b) noticeable slower, but case (a) much
> faster.
If your guess is correct (that O_DSYNC makes barrier after every single
write) - then yes,
it should be better to remove O_DSYNC and add fdatasync() after every pass of
flush of group
of independent pages
> Does (b) happens only when there is no free buffer for a another page?
I enumerated 5 cases above.
> May it be changed to eliminate not only one independent dirty pages, but
> a number of it in one pass?
I thought about to change cache_writer in this direction - make it to do
"mini-flush"
instead of writting one page at time. But, note, we have no cache_writer thread
in CS\SC
and we can do a little (almost nothing) when page lock is downgraded (often
case in CS, i
think).
I don't think that writting more than one dirty page when free buffer is
needed is good
idea as it could delay user process significantly. I could be wrong...
Regards,
Vlad
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel