> Let's remember the problem I initially observed. By default ext4 uses
> barriers and that makes it much slower than ext3 (when ext3 wasn't using
> barriers by default).
>
> But it slower only whens FW=ON. With FW=OFF there is almost no difference.
>
> Barriers are supposed to act on file's metadata only, but the speedy
> problem we see don't make that appear to be true.
>
> Firebird don't use O_SYNC but O_DSYNC instead. O_DSYNC is supposed to
> sync the data and only the metadata necessary to retrieve the data
> correctly.
>
> O_DSYNC and O_SYNC is not always different on the implementation, but at
> least in Kernel 2.6.38 the implementation are surely different and
> O_DSYNC is faster.
>
> So considering a database which is already preallocated on the
> filesystem and Firebird don't create any new file pages, barriers would
> not influence the performance. But it's slower even on this condition.
>
> So I think not everything is well know on this problem, but I did some
> tests. A thing I saw is that opening a file with O_DSYNC and opening a
> file without it but calling fdatasync after each write has almost equal
> performance.
Did you tried single connection or concurrent writes of multiply
connections also ?
> I really don't know well how Firebird does the page precedence graph,
> but if we could write all "independent" (say, the ones with the order
> they're write didn't matter) then call fdatasync and continues, would be
> much better than O_DSYNC mode.
>
> Is it possible? How generally is that precedence graph, does it have to
> many "independent" pages on it?
There is 2 general places when Firebird writes pages. All of them, of
course,
used precedence graph. See below:
a) flush on commit\rollback\detach\background_gc :
Pages (bdb's) to be flushed are collected into array, array is sorted by
page numbers
and then pages which have no dependent pages are written (in page numbers
order).
Of course, after page write, its dependency is cleared, so after first write
pass we have
another set of "independent" pages. This process continues until all pages from
array is
written.
b) "single" page write when old dirty page is eliminated from the page cache
Page cache choosed oldest dirty page and attempt to write it. But if there
is another
dirty pages which is dependent on "victim" page - they will be written before
"victim" page.
This is recursive process as currently to be written page could have
another dependent
pages which must be written first.
Same "single" write occured also when :
- page lock is downgraded and page is dirty (classic only)
- new precedence relationship will make a circle in precedence graph
- dirty page buffer is marked as system (or must_write) and page is released
- cache_writer thread (SS only) writes an oldest dirty page
You see - in case (a) we have more or less group writes while in case (b)
we have mostly
single write (but in some unfortunate cases it could affect many pages). Case
(a) is optimized
and we can call fdatasync() after each pass of writes of independent pages. In
case (b) we
have no such possibility, at least in current code.
Hope this helps,
Vlad
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel