Vlad,

> >>> - Asynchronous File I/O
> >>
> >>     It is not really asynchronous as it waits for the completion of
> >> every single IO request.
> >
> > True, but it allows the storage controller to decide the best order in which
> to perform the operations...
> 
> 
>    Order of what ? IO requests are queued serially by the same thread. It is
> exactly the same as without this code ;)

You are referring to how the application would invoke the Async IO and provide 
the Overlapped IO structure.

I am referring to the fact that actual order of execution of the IOs within the 
Overlapped IO structure is not guaranteed.  

The storage controller is free to process the IO request in any order it wants, 
which totally destroys the "carefully ordered write" basis that maintains the 
FB database integrity.


> > I could see a benefit for writing several pages, which are of the same
> 'priority/level' in the "carefully order writes", through a single operation 
> for
> any storage device.  Fewer calls would improve performance.
> 
>    Sure. But it is not present in SUPERSERVER_V2 (which we speak about).

Agreed.  That functionality is not in SUPERSERVER_V2.

I wasn't trying to suggest that the functionality was present, rather, the 
current V2 functionality is missing this important (IMO) piece (see above).


> And you miss one important word - consecutive. To write few pages at once
> they must be consecutive in physical order.

I don't think I missed anything.

Async/Overlapped IO allows for IO on any number of file blocks (aka pages) 
without limit to their locations, consecutive or not.


> > Interesting that you want to continue with it...
> 
>    Why throw away good ideas ? :)

See above about order of execution of Overlapped IOs


> > Although, I see the benefits of avoiding the "sun level hotspot" that is the
> DB Header page write operations.
> >
> > I don't see how database integrity can be maintained if the header page
> changes are not persisted to disk immediately -- aside from an MPI based
> multi-node cluster where pages changes are sent to other nodes (as
> witnesses for safekeeping)*.
> 
>    The idea is to defer header page write up to the write of the any other 
> page
> in a hope that other transactions could start in between.

But if the write is deferred to the start of the next transaction...  How would 
the database know what that the last committed transaction was?

Wouldn't this detail be required on server restart, if the server abended/was 
killed right after the transaction write?


> > * This approach could actually provide several possible benefits...

To be clear, my proposal would be to use RDMA solutions to defer all writes, by 
"posting" the writes to other nodes and then performing all writes in 
background on the main server.

In this way, the main server would not need to wait for Write IOs. 

(A possible side benefit of this approach, would be physical database 
replication by having the other nodes write the pages to local storage)


Sean

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to