Re: [Firebird-devel] SUPERSERVER_V2
30.05.2017 19:32, Leyne, Sean wrote: Async/Overlapped IO allows for IO on any number of file blocks (aka pages) without limit to their locations, consecutive or not. You words "single operation for any storage device" make me think that you are referring to a single OS call. There is no such API in "only platform that matters" which allows to post many IO requests at one call (*nix have it, btw). Note, writte of 10 consecutive pages at one call to the WriteFileXXX is still single IO request from the application (engine) POV. Windows does have the ability to post many IO requests in a single call -- see Overlapped IO structure. I already wrote that one big request to write many consecutive pages is still one single IO request from app point of view. One OVERLAPPED structure can't post more than one IO request. Regarding "consecutive pages" do you mean that the pages fall one after the other, or that 10 pages are written consecutively? I don't know what is "page fall", so - yes - i speak about physical order of pages on disk. I don't see how database integrity can be maintained if the header page changes are not persisted to disk immediately -- aside from an MPI based multi-node cluster where pages changes are sent to other nodes (as witnesses for safekeeping)*. The idea is to defer header page write up to the write of the any other page in a hope that other transactions could start in between. But if the write is deferred to the start of the next transaction... Header page write is deferred to the write of any other page, not to the start of the next transaction How would the database know what that the last committed transaction was? If few transactions started and increment Next counter in memory only, and there was no page writes in the mean time - what the problem ? No transaction committed were (as there was no page writes). Nobody ever know that transactions exists (except of its users). There is no visible effects in the database for the other users. What does page writes have to do with transaction commits? Transaction on commit (rollback) writes all pages it marks as dirty to disk. Commit\rollback returns to the user after OS confirmed all such pages are written to disk. What about the data changes that those transactions could apply to database pages, before the transaction is committed? They would be written to disk, no? Yes. If the engine dies before those transactions are committed. When the engine restarts, how would it cleanup the incomplete changes? At TIP those transactions will still be marked as active. Engine will detect it real state (dead) and undo its changes. Wouldn't this detail be required on server restart, if the server abended/was killed right after the transaction write? What detail ? Last committed tx ? It is fixed in TIP and requires page write which will force Header page to disk before TIP page will be written. But you are proposing to delay the Header pager write no? Yes. If the Header needs to be written before the TIP can be written and the TIP provides the details about the last committed Tx, how would it be possible to defer Header writes? Currently: tx1 starts fetch Header page tx2 starts fetch Header page waiting... tx1 increment Next write Header page release Header page tx2 ...Header page fetched increment Next write Header page release Header page tx1 commit write all dirty pages marked by tx1 fetch TIP page mark tx1 as committed write TIP page release TIP page tx2 commit write all dirty pages marked by tx1 fetch TIP page mark tx1 as committed write TIP page release TIP page Will be: tx1 starts fetch Header page tx2 starts fetch Header page waiting... tx1 increment Next release Header page tx2 ...Header page fetched increment Next release Header page tx1 commit write all dirty pages marked by tx1 before writting of first dirty page write Header page fetch TIP page mark tx1 as committed write TIP page release TIP page tx2 commit write all dirty pages marked by tx1 -- no need to write Header page fetch TIP page mark tx1 as committed write TIP page release TIP page Or by "Defer" do you mean really -- Header would only be written on transaction commits? If there was no other page write - yes, Header page will be written at commit (That start transaction would no longer cause a page write) Yes If so, then that would be a could good thing (still concerned about data from incomplete transactions). Sure, it is good thing ;) Still have concerns ? In this way, the main server would not need to wait for Write IOs. It is not possible to completely remove needs to wait. One need to wait for completion of page write before mark this page as dirty again. I.e. writers must wait for each other. I was making a distinction between waiting for the other
Re: [Firebird-devel] SUPERSERVER_V2
> > Async/Overlapped IO allows for IO on any number of file blocks (aka pages) > without limit to their locations, consecutive or not. > >You words "single operation for any storage device" make me think that > you are referring to a single OS call. There is no such API in "only platform > that matters" which allows to post many IO requests at one call (*nix have it, > btw). Note, writte of 10 consecutive pages at one call to the WriteFileXXX is > still single IO request from the application (engine) POV. Windows does have the ability to post many IO requests in a single call -- see Overlapped IO structure. Regarding "consecutive pages" do you mean that the pages fall one after the other, or that 10 pages are written consecutively? > >>> I don't see how database integrity can be maintained if the header > >>> page > >> changes are not persisted to disk immediately -- aside from an MPI > >> based multi-node cluster where pages changes are sent to other nodes > >> (as witnesses for safekeeping)*. > >> > >> The idea is to defer header page write up to the write of the any > >> other page in a hope that other transactions could start in between. > > > > But if the write is deferred to the start of the next transaction... > >Header page write is deferred to the write of any other page, not to the > start of the next transaction > > > How would the database know what that the last committed transaction > was? > >If few transactions started and increment Next counter in memory only, > and there was no page writes in the mean time - what the problem ? No > transaction committed were (as there was no page writes). Nobody ever > know that transactions exists (except of its users). There is no visible > effects > in the database for the other users. What does page writes have to do with transaction commits? What about the data changes that those transactions could apply to database pages, before the transaction is committed? They would be written to disk, no? If the engine dies before those transactions are committed. When the engine restarts, how would it cleanup the incomplete changes? > > > > Wouldn't this detail be required on server restart, if the server > abended/was killed right after the transaction write? > >What detail ? Last committed tx ? It is fixed in TIP and requires page > write > which will force Header page to disk before TIP page will be written. But you are proposing to delay the Header pager write no? If the Header needs to be written before the TIP can be written and the TIP provides the details about the last committed Tx, how would it be possible to defer Header writes? Or by "Defer" do you mean really -- Header would only be written on transaction commits? (That start transaction would no longer cause a page write) If so, then that would be a could good thing (still concerned about data from incomplete transactions). > > In this way, the main server would not need to wait for Write IOs. > >It is not possible to completely remove needs to wait. One need to wait for > completion of page write before mark this page as dirty again. I.e. writers > must wait for each other. I was making a distinction between waiting for the other nodes to acknowledge the receipt of the page (which can be very fast) and the need to wait for the page to be actually written to storage (slower). Writers need only wait for the other nodes to ACK. Sean -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
29.05.2017 23:45, Leyne, Sean wrote: Vlad, - Asynchronous File I/O It is not really asynchronous as it waits for the completion of every single IO request. True, but it allows the storage controller to decide the best order in which to perform the operations... Order of what ? IO requests are queued serially by the same thread. It is exactly the same as without this code ;) You are referring to how the application would invoke the Async IO and provide the Overlapped IO structure. I speak about subject - SUPERSERVER_V2 I am referring to the fact that actual order of execution of the IOs within the Overlapped IO structure is not guaranteed. The storage controller is free to process the IO request in any order it wants, which totally destroys the "carefully ordered write" basis that maintains the FB database integrity. It is possibly only if engine will post write request for the low order page not waiting for completion of write of high order page. I.e. it is easy to preserve careful write even in overlapped IO case - just don't mix pages of different precedence levels at one IO request, even if they are physically consecutive. I could see a benefit for writing several pages, which are of the same 'priority/level' in the "carefully order writes", through a single operation for any storage device. Fewer calls would improve performance. Sure. But it is not present in SUPERSERVER_V2 (which we speak about). Agreed. That functionality is not in SUPERSERVER_V2. I wasn't trying to suggest that the functionality was present, rather, the current V2 functionality is missing this important (IMO) piece (see above). My point is that IO part of SUPERSERVER_V2 is not complete and contains no useful code. And you miss one important word - consecutive. To write few pages at once they must be consecutive in physical order. I don't think I missed anything. Async/Overlapped IO allows for IO on any number of file blocks (aka pages) without limit to their locations, consecutive or not. You words "single operation for any storage device" make me think that you are referring to a single OS call. There is no such API in "only platform that matters" which allows to post many IO requests at one call (*nix have it, btw). Note, writte of 10 consecutive pages at one call to the WriteFileXXX is still single IO request from the application (engine) POV. If you speak about somethign else, please, explain. Interesting that you want to continue with it... Why throw away good ideas ? :) See above about order of execution of Overlapped IOs I see no relation, explain, pls Although, I see the benefits of avoiding the "sun level hotspot" that is the DB Header page write operations. I don't see how database integrity can be maintained if the header page changes are not persisted to disk immediately -- aside from an MPI based multi-node cluster where pages changes are sent to other nodes (as witnesses for safekeeping)*. The idea is to defer header page write up to the write of the any other page in a hope that other transactions could start in between. But if the write is deferred to the start of the next transaction... Header page write is deferred to the write of any other page, not to the start of the next transaction How would the database know what that the last committed transaction was? If few transactions started and increment Next counter in memory only, and there was no page writes in the mean time - what the problem ? No transaction committed were (as there was no page writes). Nobody ever know that transactions exists (except of its users). There is no visible effects in the database for the other users. Wouldn't this detail be required on server restart, if the server abended/was killed right after the transaction write? What detail ? Last committed tx ? It is fixed in TIP and requires page write which will force Header page to disk before TIP page will be written. * This approach could actually provide several possible benefits... To be clear, my proposal would be to use RDMA solutions to defer all writes, by "posting" the writes to other nodes and then performing all writes in background on the main server. In this way, the main server would not need to wait for Write IOs. It is not possible to completely remove needs to wait. One need to wait for completion of page write before mark this page as dirty again. I.e. writers must wait for each other. (A possible side benefit of this approach, would be physical database replication by having the other nodes write the pages to local storage) Sure Regards, Vlad -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at
Re: [Firebird-devel] SUPERSERVER_V2
Vlad, > >>> - Asynchronous File I/O > >> > >> It is not really asynchronous as it waits for the completion of > >> every single IO request. > > > > True, but it allows the storage controller to decide the best order in which > to perform the operations... > > >Order of what ? IO requests are queued serially by the same thread. It is > exactly the same as without this code ;) You are referring to how the application would invoke the Async IO and provide the Overlapped IO structure. I am referring to the fact that actual order of execution of the IOs within the Overlapped IO structure is not guaranteed. The storage controller is free to process the IO request in any order it wants, which totally destroys the "carefully ordered write" basis that maintains the FB database integrity. > > I could see a benefit for writing several pages, which are of the same > 'priority/level' in the "carefully order writes", through a single operation > for > any storage device. Fewer calls would improve performance. > >Sure. But it is not present in SUPERSERVER_V2 (which we speak about). Agreed. That functionality is not in SUPERSERVER_V2. I wasn't trying to suggest that the functionality was present, rather, the current V2 functionality is missing this important (IMO) piece (see above). > And you miss one important word - consecutive. To write few pages at once > they must be consecutive in physical order. I don't think I missed anything. Async/Overlapped IO allows for IO on any number of file blocks (aka pages) without limit to their locations, consecutive or not. > > Interesting that you want to continue with it... > >Why throw away good ideas ? :) See above about order of execution of Overlapped IOs > > Although, I see the benefits of avoiding the "sun level hotspot" that is the > DB Header page write operations. > > > > I don't see how database integrity can be maintained if the header page > changes are not persisted to disk immediately -- aside from an MPI based > multi-node cluster where pages changes are sent to other nodes (as > witnesses for safekeeping)*. > >The idea is to defer header page write up to the write of the any other > page > in a hope that other transactions could start in between. But if the write is deferred to the start of the next transaction... How would the database know what that the last committed transaction was? Wouldn't this detail be required on server restart, if the server abended/was killed right after the transaction write? > > * This approach could actually provide several possible benefits... To be clear, my proposal would be to use RDMA solutions to defer all writes, by "posting" the writes to other nodes and then performing all writes in background on the main server. In this way, the main server would not need to wait for Write IOs. (A possible side benefit of this approach, would be physical database replication by having the other nodes write the pages to local storage) Sean -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
26.05.2017 2:17, Leyne, Sean пишет: - Asynchronous File I/O It is not really asynchronous as it waits for the completion of every single IO request. True, but it allows the storage controller to decide the best order in which to perform the operations... Order of what ? IO requests are queued serially by the same thread. It is exactly the same as without this code ;) Also, note, it completely disables file system caching. It kills write performance Really? Yes. You may try Firebird with disabled file system cache to evaluate it. I could see a benefit for writing several pages, which are of the same 'priority/level' in the "carefully order writes", through a single operation for any storage device. Fewer calls would improve performance. Sure. But it is not present in SUPERSERVER_V2 (which we speak about). And you miss one important word - consecutive. To write few pages at once they must be consecutive in physical order. Separately, I seem to recall that the feature was completed and released in a IB 6.5+ release. It easy to check There is no UNIX part of it, btw. Unix/Linux... Smm-*nix!! Don't you know, Windows is the only platform that matters! Sure ;) - Defer Header Page Write (i.e. reduce the number of times that the header page is written to disk) This is most mature piece of code and i'm going to use it as a base for the our implementation. It have no support for CS and, of course, it must be tested very carefully. Interesting that you want to continue with it... Why throw away good ideas ? :) Although, I see the benefits of avoiding the "sun level hotspot" that is the DB Header page write operations. I don't see how database integrity can be maintained if the header page changes are not persisted to disk immediately -- aside from an MPI based multi-node cluster where pages changes are sent to other nodes (as witnesses for safekeeping)*. The idea is to defer header page write up to the write of the any other page in a hope that other transactions could start in between. * This approach could actually provide several possible benefits... Even over 10GBs TCP connection (w/properly configured MTU) MPI latencies (for 4KB messages) are < 250 micro-seconds (us) whereas SSD/PCIe SSDs latencies are 10-5 (ms). Using 10GBs RDMA, latencies are < 100 micro-seconds (us). Using latest RDMA NICs, latencies are < 15 micro-seconds (us). Good to know ;) Regards, Vlad -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
> > - Asynchronous File I/O > >It is not really asynchronous as it waits for the completion of every > single IO > request. True, but it allows the storage controller to decide the best order in which to perform the operations... > Also, note, it completely disables file system caching. It kills write > performance Really? I could see a benefit for writing several pages, which are of the same 'priority/level' in the "carefully order writes", through a single operation for any storage device. Fewer calls would improve performance. Separately, I seem to recall that the feature was completed and released in a IB 6.5+ release. > There is no UNIX part of it, btw. Unix/Linux... Smm-*nix!! Don't you know, Windows is the only platform that matters! > > - Defer Header Page Write (i.e. reduce the number of times that the > > header page is written to disk) > >This is most mature piece of code and i'm going to use it as a base for the > our implementation. It have no support for CS and, of course, it must be > tested very carefully. Interesting that you want to continue with it... Although, I see the benefits of avoiding the "sun level hotspot" that is the DB Header page write operations. I don't see how database integrity can be maintained if the header page changes are not persisted to disk immediately -- aside from an MPI based multi-node cluster where pages changes are sent to other nodes (as witnesses for safekeeping)*. Sean * This approach could actually provide several possible benefits... Even over 10GBs TCP connection (w/properly configured MTU) MPI latencies (for 4KB messages) are < 250 micro-seconds (us) whereas SSD/PCIe SSDs latencies are 10-5 (ms). Using 10GBs RDMA, latencies are < 100 micro-seconds (us). Using latest RDMA NICs, latencies are < 15 micro-seconds (us). -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
25.05.2017 22:18, Leyne, Sean wrote: What's SUPERSERVER_V2 in the code? My review of the code (back in 2003, see attached) found the following: - Asynchronous File I/O It is not really asynchronous as it waits for the completion of every single IO request. Also, note, it completely disables file system caching. It kills write performance and there is no compensator for it. Probably, this part was "work in progress" at IB6 times. There is no UNIX part of it, btw. - PreFetch Data Pages (i.e. statement is a natural scan so read-ahead in the file...) Yes, but it is very naive implementation and i have a big doubts it is efficient. - Defer Header Page Write (i.e. reduce the number of times that the header page is written to disk) This is most mature piece of code and i'm going to use it as a base for the our implementation. It have no support for CS and, of course, it must be tested very carefully. Regards, Vlad -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
25.05.2017 18:53, Adriano dos Santos Fernandes wrote: What's SUPERSERVER_V2 in the code? Old attempt by Borland to implement some features that can be utilized by a properly threaded SuperServer. Will it be used some day? Will it be removed some day? We preserve it as a reference. Some work by Vlad is partially based on that code. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
Forgot to attachment... > -Original Message- > From: Leyne, Sean [mailto:s...@broadviewsoftware.com] > Sent: Thursday, May 25, 2017 3:19 PM > To: For discussion among Firebird Developers de...@lists.sourceforge.net> > Subject: Re: [Firebird-devel] SUPERSERVER_V2 > > > > > What's SUPERSERVER_V2 in the code? > > My review of the code (back in 2003, see attached) found the following: > > - Asynchronous File I/O > - PreFetch Data Pages (i.e. statement is a natural scan so read-ahead in the > file...) > - Defer Header Page Write (i.e. reduce the number of times that the header > page is written to disk) > > > > Will it be used some day? > > I think there is some good things, though need to be tested. > > > In general, however, I'm not sure that those features (all in the disk IO > area) > have been outpaced by technological changes -- namely, > - Storage controllers with large RAM cache (2 or 4GB or in some cases > 64GB) > - Storage controllers with large SSD supported read caches > - SSDs (10-15x IOPs of HDDs) > - PCIe SSDs (5-10x IOPs of SSDs) > - Cross-point/Optane/NV-DIMM storage (5-10x IOPs of PCIe SSDs) > > > Sean > From: Leyne, Sean Sent: Friday, January 31, 2003 10:40 PM To: Firebird-Dev (E-mail) Subject:[Firebird-devel] Code Cleanup SUPERSERVER_V2 In all this discussion about FlushFileBuffers and the jrd/winnt.cpp module, you can't avoid the references to the SUPERSERVER_V2 define. I already knew that one of the features anticipated by this code upgrade, was Asynchronous/overlapped file I/O, so I was going to change the define to be ASYNC_FILE_IO. But first, my a good programmer should, I decided to check the rest of the code modules. What I found was that the define covers three separate feature enhancements, those being: - Asynchronous File I/O - PreFetch Data Pages (i.e. statement is a natural scan so read-ahead in the file...) - Defer Header Page Write (i.e. reduce the number of times that the header page is written to disk) Considering that the state of these features is unknown, to simply turn on the SUPERSERVER_V2 define is unrealistic, there would be too many things to test at once. Accordingly, I am suggesting to change the DEFINEs (as the case dictates), the defines I'm proposing are: ASYNC_FILE_IO, PREFETCH_PAGES and DEFERRED_HEADER_WRITE. I'm thinking of making these changes within the next week, does anyone have any comments? Sean --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld =omething 2 See! http://www.vasoftware.com ___ Firebird-devel mailing list Firebird-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-devel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdotFirebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2
> What's SUPERSERVER_V2 in the code? My review of the code (back in 2003, see attached) found the following: - Asynchronous File I/O - PreFetch Data Pages (i.e. statement is a natural scan so read-ahead in the file...) - Defer Header Page Write (i.e. reduce the number of times that the header page is written to disk) > Will it be used some day? I think there is some good things, though need to be tested. In general, however, I'm not sure that those features (all in the disk IO area) have been outpaced by technological changes -- namely, - Storage controllers with large RAM cache (2 or 4GB or in some cases 64GB) - Storage controllers with large SSD supported read caches - SSDs (10-15x IOPs of HDDs) - PCIe SSDs (5-10x IOPs of SSDs) - Cross-point/Optane/NV-DIMM storage (5-10x IOPs of PCIe SSDs) Sean -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2 define and code can be removed ?
On 03/30/15 16:16, marius adrian popa wrote: Seems that it is not used for years https://github.com/FirebirdSQL/core/search?utf8=%E2%9C%93q=SUPERSERVER_V2 Jim removed it in Vulcan Please do not cleanup it. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] SUPERSERVER_V2 define and code can be removed ?
No problem this is why i asked related old thread with cleanup question https://sourceforge.net/p/firebird/mailman/message/15759466/ On Mon, Mar 30, 2015 at 4:31 PM, Alex Peshkoff peshk...@mail.ru wrote: On 03/30/15 16:16, marius adrian popa wrote: Seems that it is not used for years https://github.com/FirebirdSQL/core/search?utf8=%E2%9C%93q=SUPERSERVER_V2 Jim removed it in Vulcan Please do not cleanup it. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel