29.05.2017 23:45, Leyne, Sean wrote:
Vlad,

- Asynchronous File I/O

     It is not really asynchronous as it waits for the completion of
every single IO request.

True, but it allows the storage controller to decide the best order in which
to perform the operations...


    Order of what ? IO requests are queued serially by the same thread. It is
exactly the same as without this code ;)

You are referring to how the application would invoke the Async IO and provide 
the Overlapped IO structure.

  I speak about subject - SUPERSERVER_V2

I am referring to the fact that actual order of execution of the IOs within the 
Overlapped IO structure is not guaranteed.

The storage controller is free to process the IO request in any order it wants, which 
totally destroys the "carefully ordered write" basis that maintains the FB 
database integrity.

  It is possibly only if engine will post write request for the low order page 
not waiting
for completion of write of high order page. I.e. it is easy to preserve careful 
write even
in overlapped IO case - just don't mix pages of different precedence levels at 
one IO request,
even if they are physically consecutive.


I could see a benefit for writing several pages, which are of the same
'priority/level' in the "carefully order writes", through a single operation for
any storage device.  Fewer calls would improve performance.

    Sure. But it is not present in SUPERSERVER_V2 (which we speak about).

Agreed.  That functionality is not in SUPERSERVER_V2.

I wasn't trying to suggest that the functionality was present, rather, the 
current V2 functionality is missing this important (IMO) piece (see above).

  My point is that IO part of SUPERSERVER_V2 is not complete and contains no 
useful code.

And you miss one important word - consecutive. To write few pages at once
they must be consecutive in physical order.

I don't think I missed anything.

Async/Overlapped IO allows for IO on any number of file blocks (aka pages) 
without limit to their locations, consecutive or not.

  You words "single operation for any storage device" make me think that you 
are referring
to a single OS call. There is no such API in "only platform that matters" which 
allows to
post many IO requests at one call (*nix have it, btw). Note, writte of 10 
consecutive pages
at one call to the WriteFileXXX is still single IO request from the application 
(engine) POV.

  If you speak about somethign else, please, explain.

Interesting that you want to continue with it...

    Why throw away good ideas ? :)

See above about order of execution of Overlapped IOs

  I see no relation, explain, pls

Although, I see the benefits of avoiding the "sun level hotspot" that is the
DB Header page write operations.

I don't see how database integrity can be maintained if the header page
changes are not persisted to disk immediately -- aside from an MPI based
multi-node cluster where pages changes are sent to other nodes (as
witnesses for safekeeping)*.

    The idea is to defer header page write up to the write of the any other page
in a hope that other transactions could start in between.

But if the write is deferred to the start of the next transaction...

  Header page write is deferred to the write of any other page, not to the 
start of
the next transaction

How would the database know what that the last committed transaction was?

  If few transactions started and increment Next counter in memory only, and 
there
was no page writes in the mean time - what the problem ? No transaction 
committed
were (as there was no page writes). Nobody ever know that transactions exists 
(except
of its users). There is no visible effects in the database for the other users.


Wouldn't this detail be required on server restart, if the server abended/was 
killed right after the transaction write?

  What detail ? Last committed tx ? It is fixed in TIP and requires page write
which will force Header page to disk before TIP page will be written.

* This approach could actually provide several possible benefits...

To be clear, my proposal would be to use RDMA solutions to defer all writes, by 
"posting" the writes to other nodes and then performing all writes in 
background on the main server.

In this way, the main server would not need to wait for Write IOs.

  It is not possible to completely remove needs to wait. One need to wait for
completion of page write before mark this page as dirty again. I.e. writers must
wait for each other.

(A possible side benefit of this approach, would be physical database 
replication by having the other nodes write the pages to local storage)

  Sure

Regards,
Vlad

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to