29.05.2017 23:45, Leyne, Sean wrote:
Vlad,
- Asynchronous File I/O
It is not really asynchronous as it waits for the completion of
every single IO request.
True, but it allows the storage controller to decide the best order in which
to perform the operations...
Order of what ? IO requests are queued serially by the same thread. It is
exactly the same as without this code ;)
You are referring to how the application would invoke the Async IO and provide
the Overlapped IO structure.
I speak about subject - SUPERSERVER_V2
I am referring to the fact that actual order of execution of the IOs within the
Overlapped IO structure is not guaranteed.
The storage controller is free to process the IO request in any order it wants, which
totally destroys the "carefully ordered write" basis that maintains the FB
database integrity.
It is possibly only if engine will post write request for the low order page
not waiting
for completion of write of high order page. I.e. it is easy to preserve careful
write even
in overlapped IO case - just don't mix pages of different precedence levels at
one IO request,
even if they are physically consecutive.
I could see a benefit for writing several pages, which are of the same
'priority/level' in the "carefully order writes", through a single operation for
any storage device. Fewer calls would improve performance.
Sure. But it is not present in SUPERSERVER_V2 (which we speak about).
Agreed. That functionality is not in SUPERSERVER_V2.
I wasn't trying to suggest that the functionality was present, rather, the
current V2 functionality is missing this important (IMO) piece (see above).
My point is that IO part of SUPERSERVER_V2 is not complete and contains no
useful code.
And you miss one important word - consecutive. To write few pages at once
they must be consecutive in physical order.
I don't think I missed anything.
Async/Overlapped IO allows for IO on any number of file blocks (aka pages)
without limit to their locations, consecutive or not.
You words "single operation for any storage device" make me think that you
are referring
to a single OS call. There is no such API in "only platform that matters" which
allows to
post many IO requests at one call (*nix have it, btw). Note, writte of 10
consecutive pages
at one call to the WriteFileXXX is still single IO request from the application
(engine) POV.
If you speak about somethign else, please, explain.
Interesting that you want to continue with it...
Why throw away good ideas ? :)
See above about order of execution of Overlapped IOs
I see no relation, explain, pls
Although, I see the benefits of avoiding the "sun level hotspot" that is the
DB Header page write operations.
I don't see how database integrity can be maintained if the header page
changes are not persisted to disk immediately -- aside from an MPI based
multi-node cluster where pages changes are sent to other nodes (as
witnesses for safekeeping)*.
The idea is to defer header page write up to the write of the any other page
in a hope that other transactions could start in between.
But if the write is deferred to the start of the next transaction...
Header page write is deferred to the write of any other page, not to the
start of
the next transaction
How would the database know what that the last committed transaction was?
If few transactions started and increment Next counter in memory only, and
there
was no page writes in the mean time - what the problem ? No transaction
committed
were (as there was no page writes). Nobody ever know that transactions exists
(except
of its users). There is no visible effects in the database for the other users.
Wouldn't this detail be required on server restart, if the server abended/was
killed right after the transaction write?
What detail ? Last committed tx ? It is fixed in TIP and requires page write
which will force Header page to disk before TIP page will be written.
* This approach could actually provide several possible benefits...
To be clear, my proposal would be to use RDMA solutions to defer all writes, by
"posting" the writes to other nodes and then performing all writes in
background on the main server.
In this way, the main server would not need to wait for Write IOs.
It is not possible to completely remove needs to wait. One need to wait for
completion of page write before mark this page as dirty again. I.e. writers must
wait for each other.
(A possible side benefit of this approach, would be physical database
replication by having the other nodes write the pages to local storage)
Sure
Regards,
Vlad
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel