Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Fri, Feb 10, 2012 at 10:03:06AM -0800, Jan Dubois wrote: On Fri, 10 Feb 2012, Tim Bunce wrote: On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote: If anyone knows of a more elegant way to make a function from DBI.xs available to DBD:: code, please let me know! I hope to take a proper look in the next day or so. The standard way nowadays seems to be to use ExtUtils::Depends. I'm not convinced though that it is any more elegant than the method you are already using. An example module using ExtUtils::Depends is B::Hooks::OP::Check. It is important to also list all exported functions in the FUNCLIST parameter to MakeMaker; otherwise the symbols will not be exported on AIX and Windows. Is this necessary for my patch? I assumed that just by treating my function as XS (i.e. by calling newXS), something magical would happen somehow. After all, none of the other XS functions need a special mention in Makefile.PL? -- The crew of the Enterprise encounter an alien life form which is surprisingly neither humanoid nor made from pure energy. -- Things That Never Happen in Star Trek #22
RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Fri, 17 Feb 2012, Dave Mitchell wrote: On Fri, Feb 10, 2012 at 10:03:06AM -0800, Jan Dubois wrote: On Fri, 10 Feb 2012, Tim Bunce wrote: On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote: If anyone knows of a more elegant way to make a function from DBI.xs available to DBD:: code, please let me know! I hope to take a proper look in the next day or so. The standard way nowadays seems to be to use ExtUtils::Depends. I'm not convinced though that it is any more elegant than the method you are already using. An example module using ExtUtils::Depends is B::Hooks::OP::Check. It is important to also list all exported functions in the FUNCLIST parameter to MakeMaker; otherwise the symbols will not be exported on AIX and Windows. Is this necessary for my patch? No, it isn't. I assumed that just by treating my function as XS (i.e. by calling newXS), something magical would happen somehow. After all, none of the other XS functions need a special mention in Makefile.PL? None of the XS functions are actually exposed to the linker, except for the boot_* symbol, which is already treated specially by MakeMaker. All other XS symbols are only accessed via the CV and not via the linker/loader symbol tables. Adding to FUNCLIST is necessary if you want to export the symbol at the C level and have it resolved by the OS loader. Cheers, -Jan
undefined symbol: mro_meta_init on 5.10.0 (was: speeding up XS_DBI_dispatch())
On Sun, Feb 05, 2012 at 09:04:23PM +, Dave Mitchell wrote: Anyway, attached is the final patch. I've tested it under 5.8.1, 5.8.9, 5.14.2 and 5.15.7, with various permutations of threaded builds. I've just noticed that DBI 1.617_901 has been failing on perl 5.10.0 http://matrix.cpantesters.org/?dist=DBI+1.617_901 The issue seems to be undefined symbol: mro_meta_init e.g.: http://www.cpantesters.org/cpan/report/ba083076-540a-11e1-a48f-e7fb434ae6f1 Any ideas? Tim.
Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote: I'm assuming that these won't be applied until after the windows() fork issue is fixed, so think of this email more as a preview. The first fix is relatively straightforward, and is private to DBI.xs, since the PERINTERP macros were kind of a prototype for MY_CXT anyway. Since the first was straightforward, and the windows fork issue has been resolved (thanks Jan!) I've applied the first patch and uploaded a trial release for cpantesters to chew on. Thanks Dave. I'll look more closely at the second this week. I won't apply it till we've discussed any issues (which are most likely to be around how to minimize the pain of synchronized updates of DBI and drivers). It might even make sense to support both old and new mechanisms for a while. There's also the mro_meta_init issue. Tim.
Re: speeding up XS_DBI_dispatch()
On 11/02/2012 00:48, Jan Dubois wrote: On Fri, 10 Feb 2012, Dave Mitchell wrote: On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote: Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? I'm afraid not. The problem is due to the getenv() call during interpreter cloning. On Windows Perl keeps a virtual host environment for each interpreter, with its own cwd, and its own set of environment variables. I don't have time to figure out _why_ this is an issue, but maybe you can check the environment variable during the BOOT section and just set a global variable instead, to work around this issue? if (getenv(PERL_DBI_XSBYPASS)) use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS)); On my old Windows 2000 system the test would simply abort with the message abnormal program termination. I was just guessing that this is somehow triggered by the abort() function in the C runtime. Setting a breakpoint on it confirms it, even though the stack trace looks incomplete (I don't see PerlEnvGetenv() calling abort() directly): MSVCRT!abort perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462] DBI!dbi_bootinit+0x276 [DBI.xs @ 468] DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280] perl514!Perl_pp_entersub+0x701 [..\pp_hot.c @ 3049] perl514!Perl_runops_standard+0xc [..\run.c @ 41] perl514!Perl_call_sv+0x195 [perl.c @ 2633] perl514!perl_clone_using+0x18ad [..\sv.c @ 13422] perl514!PerlProcFork+0x89 [perlhost.h @ 1854] perl514!Perl_pp_fork+0x42 [..\pp_sys.c @ 4044] perl514!Perl_runops_standard+0xc [..\run.c @ 41] perl514!S_run_body+0xf5 [perl.c @ 2351] perl514!perl_run+0x179 [perl.c @ 2269] perl514!RunPerl+0xed [perllib.c @ 270] perl!main+0x12 [perlmain.c @ 22] perl!mainCRTStartup+0xe3 KERNEL32!BaseProcessStart+0x3d Anyways, I would suggest trying a workaround by moving the getenv() call outside the cloning operation. Cheers, -Jan Nice find Jan. Commenting out the getenv works for me. Martin
Re: speeding up XS_DBI_dispatch()
On Fri, Feb 10, 2012 at 04:48:38PM -0800, Jan Dubois wrote: On Fri, 10 Feb 2012, Dave Mitchell wrote: On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote: Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? I'm afraid not. The problem is due to the getenv() call during interpreter cloning. On Windows Perl keeps a virtual host environment for each interpreter, with its own cwd, and its own set of environment variables. I don't have time to figure out _why_ this is an issue, but maybe you can check the environment variable during the BOOT section and just set a global variable instead, to work around this issue? When called from BOOT dbi_bootinit() gets passed a NULL parent_dbis parameter, but when called from CLONE it's not null, so this might be a good fix: @@ -579,7 +579,10 @@ gv_fetchpv(DBI::lasth, GV_ADDMULTI, SVt_PV); gv_fetchpv(DBI::rows, GV_ADDMULTI, SVt_PV); -if (getenv(PERL_DBI_XSBYPASS)) +/* we only need to check the env var on the initial boot + * which is handy because it can core dump during CLONE on windows + */ +if (!parent_dbis getenv(PERL_DBI_XSBYPASS)) use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS)); } Martin, could you try that? Setting a breakpoint on it confirms it, even though the stack trace looks incomplete (I don't see PerlEnvGetenv() calling abort() directly): MSVCRT!abort perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462] DBI!dbi_bootinit+0x276 [DBI.xs @ 468] DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280] Anyways, I would suggest trying a workaround by moving the getenv() call outside the cloning operation. Hopefully the above will do the trick. Meanwhile, aborting if getenv is called during a CLONE sure seems like a bug worthy of a ticket for p5p. Thanks Jan! Tim.
Re: speeding up XS_DBI_dispatch()
On 11 February 2012 14:55, Tim Bunce tim.bu...@pobox.com wrote: On Fri, Feb 10, 2012 at 04:48:38PM -0800, Jan Dubois wrote: MSVCRT!abort perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462] DBI!dbi_bootinit+0x276 [DBI.xs @ 468] DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280] Anyways, I would suggest trying a workaround by moving the getenv() call outside the cloning operation. Hopefully the above will do the trick. Meanwhile, aborting if getenv is called during a CLONE sure seems like a bug worthy of a ticket for p5p. I am not so sure. During a clone, the internals are potentially in an indeterminate state. I could see it being argued that the only thing you should be doing during a clone is cloning, not environment lookups. But of course, if we can fix it that would be better. And the above is pure speculation. Yves -- perl -Mre=debug -e /just|another|perl|hacker/
Re: speeding up XS_DBI_dispatch()
On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote: Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? I'm afraid not. -- All wight. I will give you one more chance. This time, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers. -- Life of Brian
Re: speeding up XS_DBI_dispatch()
On 10/02/12 10:01, Dave Mitchell wrote: On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote: Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? I'm afraid not. Have you any suggestion on how I could help debug this? I do have A windows machine which exhibits the problem. Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com
Re: speeding up XS_DBI_dispatch()
On 10/02/12 10:54, Tim Bunce wrote: On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote: Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on Windows. Ah, 1.616_901 and 1.616_902! So unrelated to Dave's method cache work. See if this works: - static int use_xsbypass = 1; + static int use_xsbypass = 0; Think I already tried that last night by setting the env var and it made no difference. Don't have the machine to hand right now although I could try and set it up here. If not, then it's probably something to do with the dPERINTERP changes. which is my suspicion. Running with a high trace level might shed some light but a strack trace of some kind would be ideal. I tried tracing but nothing stood out - I could post the end of the trace later. Tim. Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com
Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Thu, Jan 26, 2012 at 12:26:44PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 05:37:13PM -0800, Jan Dubois wrote: On Wed, 25 Jan 2012, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. The dPERINTERP stuff was added by ActiveState years ago to support MULTIPLICITY. I don't remember the details now. I do recall that driver authors are encouraged to avoid using the DBIS macro due to the cost. There's rarely a need now as DBI handles carry a pointer to the dbi_state structure. The DBI::DBD docs say Sorry, only skimming the conversation, so maybe this is obvious to you: PERINTERP was a prototype from the 5.005 days for the MY_CXT stuff that is now in the code. If you look at 5.8.1, then you'll see MY_CXT being almost identical to PERINTERP in DBI. But switching to MY_CXT should give you performance benefits on later Perl versions, which have an improved MY_CXT implementation. Great. Thanks Jan. Anyone want to take a shot at that? I attach two patches; the first replaces dPERINTERP with dMY_THX, while the second extends this to the DBIS stuff too. The headline figure (YMMV etc) is that on a recent threaded perl, these two patches collectively make my mysql empty test loop twice as fast!!! : while ($sth-fetch) { $c++ } On a perl 5.10.0 (before dMY_THX was made much more efficient), the speedup is less spectacular, but is still about 25%. I'm assuming that these won't be applied until after the windows() fork issue is fixed, so think of this email more as a preview. The first fix is relatively straightforward, and is private to DBI.xs, since the PERINTERP macros were kind of a prototype for MY_CXT anyway. The second patch, which makes DBIS more efficient, is a lot more complex, and more likely to break things (especially as it's changing a bunch of macros that are directly #included by the DBD drivers. You may need to bump API version numbers; I don't understand that bit. It works by adding a C function, _dbi_state_lval(), to DBI.xs which returns a pointer to the static(ish) dbistate struct that only DBI.xs knows about. However, I couldn't find any way for DBD:: code to call functions within DBI, so I did a little hack: I faked up an XS sub, DBI::_dbi_state_lval(), whose CvXSUB points to the _dbi_state_lval function. Since _dbi_state_lval() isn't actually an XS function, perl-level code that tries to call DBI::_dbi_state_lval() will crash and burn. If anyone knows of a more elegant way to make a function from DBI.xs available to DBD:: code, please let me know! Anyway, at the DBD end of things, the code extracts the address of the function from DBI::_dbi_state_lval's CvXSUB slot, and caches it in a static var. Then in threaded builds, DBIS expands to a call to a static function that calls _dbi_state_lval() which returns (MY_CXT.dbi_state). In unthreaded builds, it just returns the value of a static var as normal. Here are some timings; remember that MY_CXT was made a lot faster in 5.10.0, so I've included timings for 5.8.9 too. As you'd expect, only the threaded build have a significant speed-up; I think the tiny speed-ups in the non-threaded builds are just noise. The three timings (in sec) for the basic while($sth-fetch){$c++} loop are for: (1) the baseline: r15128 plus my method cache code (2) in addition, replace dPERINTERP with dMY_CXT (3) in addition, fix DBIS 1 2 3 - -- -- 40.10 - 29.965.8.9threaded, optimised 37.62 33.98 18.965.15.7 threaded, optimised 12.85 - 12.555.8.9 unthreaded, optimised 13.41 13.46 12.975.15.6 unthreaded, optimised The big saving between (2) and (3) is due to DBD::mysql still using DBIS; in particular, for every fetch call. -- This email is confidential, and now that you have read it you are legally obliged to shoot yourself. Or shoot a lawyer, if you prefer. If you have received this email in error, place it in its original wrapping and return for a full refund. By opening this email, you accept that Elvis lives. diff --git a/DBI.xs b/DBI.xs index ac161f4..887111d 100644 --- a/DBI.xs +++ b/DBI.xs @@ -16,8 +16,6 @@ #include sys/timeb.h # endif -#define MY_VERSION DBI( XS_VERSION ) - /* The XS dispatcher code can optimize calls to XS driver methods, * bypassing the usual call_sv() and argument handling overheads. * Just-in-case it causes problems there's an (undocumented) way @@ -277,40 +275,24 @@ static GV* inner_method_lookup(pTHX_ HV *stash, CV *cv, const char *meth_name) /* --- make DBI safe for multiple perl interpreters --- */ -/* Contributed by Murray Nesbitt of ActiveState */ -/* (This pre-dates, and should be replaced by, MY_CTX) */ +/* Originally contributed by Murray Nesbitt of
Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote: On Thu, Jan 26, 2012 at 12:26:44PM +, Tim Bunce wrote: I attach two patches; the first replaces dPERINTERP with dMY_THX, while the second extends this to the DBIS stuff too. The headline figure (YMMV etc) is that on a recent threaded perl, these two patches collectively make my mysql empty test loop twice as fast!!! : while ($sth-fetch) { $c++ } On a perl 5.10.0 (before dMY_THX was made much more efficient), the speedup is less spectacular, but is still about 25%. That's great news! I'm assuming that these won't be applied until after the windows() fork issue is fixed, so think of this email more as a preview. *nods* [...] If anyone knows of a more elegant way to make a function from DBI.xs available to DBD:: code, please let me know! I hope to take a proper look in the next day or so. 1 2 3 - -- -- 40.10 - 29.965.8.9threaded, optimised 37.62 33.98 18.965.15.7 threaded, optimised 12.85 - 12.555.8.9 unthreaded, optimised 13.41 13.46 12.975.15.6 unthreaded, optimised Most excellent! The big saving between (2) and (3) is due to DBD::mysql still using DBIS; in particular, for every fetch call. That should be a good reminder to driver authors to avoid using DBIS! Thanks again Dave. Tim.
RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Fri, 10 Feb 2012, Tim Bunce wrote: On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote: If anyone knows of a more elegant way to make a function from DBI.xs available to DBD:: code, please let me know! I hope to take a proper look in the next day or so. The standard way nowadays seems to be to use ExtUtils::Depends. I'm not convinced though that it is any more elegant than the method you are already using. An example module using ExtUtils::Depends is B::Hooks::OP::Check. It is important to also list all exported functions in the FUNCLIST parameter to MakeMaker; otherwise the symbols will not be exported on AIX and Windows. Personally I've just added names to FUNCLIST, and then called dlopen/dlsym or LoadLibrary/GetProcAddress on the embedding side (not actually another module, but an app hosting a Perl interpreter). Ideally you should be able to do this with the DynaLoader dl_load_file() and dl_find_symbol() functions, but I found some claims (in the FFI module docs) that this has problems on Windows. I don't know what those problems are, but it looks like dl_load_file() on Windows will return a handle to the current executable when the requested library cannot be found. I have no clue why it is doing that. So, in summary, I think the current approach isn't such a bad solution at all. :) Cheers, -Jan
RE: speeding up XS_DBI_dispatch()
On Fri, 10 Feb 2012, Dave Mitchell wrote: On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote: Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? I'm afraid not. The problem is due to the getenv() call during interpreter cloning. On Windows Perl keeps a virtual host environment for each interpreter, with its own cwd, and its own set of environment variables. I don't have time to figure out _why_ this is an issue, but maybe you can check the environment variable during the BOOT section and just set a global variable instead, to work around this issue? if (getenv(PERL_DBI_XSBYPASS)) use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS)); On my old Windows 2000 system the test would simply abort with the message abnormal program termination. I was just guessing that this is somehow triggered by the abort() function in the C runtime. Setting a breakpoint on it confirms it, even though the stack trace looks incomplete (I don't see PerlEnvGetenv() calling abort() directly): MSVCRT!abort perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462] DBI!dbi_bootinit+0x276 [DBI.xs @ 468] DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280] perl514!Perl_pp_entersub+0x701 [..\pp_hot.c @ 3049] perl514!Perl_runops_standard+0xc [..\run.c @ 41] perl514!Perl_call_sv+0x195 [perl.c @ 2633] perl514!perl_clone_using+0x18ad [..\sv.c @ 13422] perl514!PerlProcFork+0x89 [perlhost.h @ 1854] perl514!Perl_pp_fork+0x42 [..\pp_sys.c @ 4044] perl514!Perl_runops_standard+0xc [..\run.c @ 41] perl514!S_run_body+0xf5 [perl.c @ 2351] perl514!perl_run+0x179 [perl.c @ 2269] perl514!RunPerl+0xed [perllib.c @ 270] perl!main+0x12 [perlmain.c @ 22] perl!mainCRTStartup+0xe3 KERNEL32!BaseProcessStart+0x3d Anyways, I would suggest trying a workaround by moving the getenv() call outside the cloning operation. Cheers, -Jan
Re: speeding up XS_DBI_dispatch()
On 9 February 2012 22:53, Martin J. Evans martin.ev...@easysoft.com wrote: Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on Windows. The test 16destroy.t fails with a Windows error This application has requested the Runtime to terminate in an unusual way Initially I saw this in 1.617 but then I stepped backwards to find the problem occurred between these 2 versions. Some smokers have reported this problem in 5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook which is strawberry perl 5.12.0. As you'll see from the Changes file the only 2 entries added were: Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce) Well perls fork() relies on threaded perl so it could very easily be Dave's patch. Dave do you have access to a win32 build environment? Added note to DBI::Profile about async queries (Marcel GrFCnauer). I've diffed the two versions but there were quite a lot of changes and so far I cannot locate the problem. I've given up for tonight but might find time to do more. I have a vastly reduced 16destroy.t which still demonstrates the problem if it helps anyone. Is it possible to do something like git-bisect here? Yves -- perl -Mre=debug -e /just|another|perl|hacker/
Re: speeding up XS_DBI_dispatch()
On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote: Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on Windows. The test 16destroy.t fails with a Windows error This application has requested the Runtime to terminate in an unusual way I'd guess that's Windows polite version of Seg Fault ot Bus Error :) I know ~0 about windows but I seem to recall that there _might_ be something useful in the system event logs. Initially I saw this in 1.617 but then I stepped backwards to find the problem occurred between these 2 versions. Some smokers have reported this problem in 5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook which is strawberry perl 5.12.0. As you'll see from the Changes file the only 2 entries added were: Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce) Added note to DBI::Profile about async queries (Marcel GrFCnauer). I've diffed the two versions but there were quite a lot of changes and so far I cannot locate the problem. I've given up for tonight but might find time to do more. When you get a moment could you try this change: --- DBI.xs (revision 15137) +++ DBI.xs (working copy) @@ -3536,7 +3536,7 @@ } } -if (is_orig_method_name) +if (0 is_orig_method_name) imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh), cv, meth_name); else which will, I think, effectively disable the runtime side of the method cache changes. Tim.
Re: speeding up XS_DBI_dispatch()
On 09/02/2012 22:12, Tim Bunce wrote: On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote: Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on Windows. The test 16destroy.t fails with a Windows error This application has requested the Runtime to terminate in an unusual way I'd guess that's Windows polite version of Seg Fault ot Bus Error :) I know ~0 about windows but I seem to recall that there _might_ be something useful in the system event logs. I'll look more but no luck so far. Initially I saw this in 1.617 but then I stepped backwards to find the problem occurred between these 2 versions. Some smokers have reported this problem in 5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook which is strawberry perl 5.12.0. As you'll see from the Changes file the only 2 entries added were: Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce) Added note to DBI::Profile about async queries (Marcel GrFCnauer). I've diffed the two versions but there were quite a lot of changes and so far I cannot locate the problem. I've given up for tonight but might find time to do more. When you get a moment could you try this change: --- DBI.xs (revision 15137) +++ DBI.xs (working copy) @@ -3536,7 +3536,7 @@ } } -if (is_orig_method_name) +if (0 is_orig_method_name) imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh), cv, meth_name); else which will, I think, effectively disable the runtime side of the method cache changes. Tim. nope, didn't change anything - it still failed. Martin
Re: speeding up XS_DBI_dispatch()
On Wed, Feb 08, 2012 at 10:58:37AM +, Tim Bunce wrote: PS - I'm currently working on bringing PERINTERP and DBIS into the Brave New World of MY_CXT. Wonderful! I presume that'll cause an API change that'll require compiled drivers to be recompiled (ie. we'd bump DBISTATE_VERSION so check_version() will croak). It's been many years since a DBI upgrade has required that (1998 as far as I can tell from the Changes file). I dunno. The thing that is changing from DBD's code perspective is some of the macros; in particular, on threaded builds, DBIS_PUBLISHED_LVALUE is now defined(*) as: #define DBIS_PUBLISHED_LVALUE (*(dbi_state_lval(aTHX))) where dbi_state_lval() is a new function. When I tried doing a 'make test' of DBI against a perl with an existing DBI and DBD::mysql installed, one of the first DBI tests (01basics.t? 02dbidrv.t?) chucked out some mysql errors, so I just deleted DBD::mysql from the perl. Perhaps that means the version does indeed need bumping? Anyway, you'll have a better idea once I submit the patch - I'll leave any bumping to you. (*) well, currently, anyway. -- My Dad used to say 'always fight fire with fire', which is probably why he got thrown out of the fire brigade.
Re: speeding up XS_DBI_dispatch()
On Sun, Feb 05, 2012 at 09:04:23PM +, Dave Mitchell wrote: On Mon, Jan 30, 2012 at 10:56:37PM +, Tim Bunce wrote: dbih_getcom2 does the first part of that when looking up DBI hande magic. I never added the move to the top because I figured it would be very rare for magic to get added to the inner hash of a DBI handle. It seems even less likely that someone would add magic to a DBI method, but you never know. Unless you can think of a plausible case I'd be happy if you just added the short-cut+fallback and skipped the shuffle. I'd already written the magic shuffler code by the time I saw your email, so unless you want me to rip it out, it's your's fro free! :) Anyway, attached is the final patch. I've tested it under 5.8.1, 5.8.9, 5.14.2 and 5.15.7, with various permutations of threaded builds. It took quite a bit more work from the initial draft I showed you, mainly getting the threaded stuff right, and a stupid PL_generation / PL_sub_generation mix-up. I knew what the difference was between the two vars, but whenever I visually inspected the code, I kept seeing the former as the latter and couldn't understand why cache invalidation didn't work! -imp_msv = (SV*)gv_fetchmethod_autoload(DBIc_IMP_STASH(imp_xxh), meth_name, FALSE); +if (is_orig_method_name) +imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh), +cv, meth_name); +else +imp_msv = (SV*)gv_fetchmethod_autoload(DBIc_IMP_STASH(imp_xxh), +meth_name, FALSE); /* if method was a 'func' then try falling back to real 'func' method */ if (!imp_msv (ima_flags IMA_FUNC_REDIRECT)) { @@ -3463,7 +3582,7 @@ -if (!imp_msv) { +if (!imp_msv || !GvCV(imp_msv)) { What's the GvCV test for? +++ t/31methcache.t (revision 0) The tests look great. Thanks. +BEGIN { eval use threads; }# Must be first +my $use_threads_err = $@; +# With this test code and threads, 5.8.1 has issues with freeing freed +# scalars, while 5.8.9 doesn't; I don't know about in-between - DAPM issues? i.e., warnings at global destruction? The Mac OS X perl 5.8.8 with threads that I have handy works ok. +my $has_threads = $Config{useithreads} $] = 5.008009; +die $use_threads_err if $has_threads $use_threads_err; Umm. I'm tempted to disable the cache at compile time for perl 5.8.8. Meanwhile, I've applied the patch and uploaded a trial release for cpantesters to chew on. Once there are sufficient reports from old perls that have threading enabled I'll scan them for warnings. Thanks again Dave! Tim.
Re: speeding up XS_DBI_dispatch()
On Tue, Feb 07, 2012 at 10:54:12PM +, Tim Bunce wrote: -if (!imp_msv) { +if (!imp_msv || !GvCV(imp_msv)) { What's the GvCV test for? One of the more vicious tests in t/31methcache.t does local $DBD::Sponge::st::{fetch}; which causes gv_fetchmeth(...,'fetch') to return a GV that has a null CV slot (IIRC). +# With this test code and threads, 5.8.1 has issues with freeing freed +# scalars, while 5.8.9 doesn't; I don't know about in-between - DAPM issues? i.e., warnings at global destruction? Yes, similar to those that t/35thrclone.t generates; i.e. I assume that it's a generic issue with ithreads in early 5.8.x releases (I remember fixing a lot of such issues, and Nicholas backporting them into maint-5.8). So I decided to just skip the threading-related cache tests for early 5.8.x's. The Mac OS X perl 5.8.8 with threads that I have handy works ok. +my $has_threads = $Config{useithreads} $] = 5.008009; +die $use_threads_err if $has_threads $use_threads_err; Umm. I'm tempted to disable the cache at compile time for perl 5.8.8. I don't think the caching code is bringing up any new threading issues with early 5.8.x's; i.e. the new test file t/31methcache gives the same pattern of 'Attempt to free unreferenced scalar' warnings with or without the cache code. Meanwhile, I've applied the patch and uploaded a trial release for cpantesters to chew on. Once there are sufficient reports from old perls that have threading enabled I'll scan them for warnings. Ok thanks. PS - I'm currently working on bringing PERINTERP and DBIS into the Brave New World of MY_CXT. -- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.
Re: speeding up XS_DBI_dispatch()
On Sun, Jan 29, 2012 at 10:06:58PM +, Tim Bunce wrote: On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote: The code itself (see diff below) just attaches some magic to the outer CV that caches the GV of the corresponding inner method. Any reason for not using the existing 'internal method attribute' mechanism? _install_method() already allocates a structure and attaches it the the CV using CvXSUBANY(cv).any_ptr. You could add the three elements of method_cache_t directly into dbi_ima_t. Then, since the dispatch code already has the structure pointer (ima) handy, your cache code could avoid the mg_findext call and use ima-method_cache directly. Well, the original reason was that I wasn't aware of dbi_ima_t :-) However, on further reflection, I'm not sure that would work, as ima-method_cache would contain reference-counted SV pointers that would need to be freed when the CV is freed, and be duped when the CV is cloned in a new thread. But there is no specific mechanism to allow this to happen, so I'd have to mess with CLONE (and maintain a list of special CVs that need cloning), and attach magic that has a free() method to ensure that ima is freed when the CV is. Which is all beginning to sound rather complex. So I'm tempted to stick with the magic approach. I could optimise it to assume the cache magic is always the first one (and if not, take a slower route that finds and then moves it to the top). That's also caused me to realise that my current code isn't threadsafe, as it it doesn't dup the cache entries. Easily fixed, though. -- Art is anything that has a label (especially if the label is untitled 1)
Re: speeding up XS_DBI_dispatch()
On Mon, Jan 30, 2012 at 12:48:32PM +, Dave Mitchell wrote: On Sun, Jan 29, 2012 at 10:06:58PM +, Tim Bunce wrote: On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote: The code itself (see diff below) just attaches some magic to the outer CV that caches the GV of the corresponding inner method. Any reason for not using the existing 'internal method attribute' mechanism? _install_method() already allocates a structure and attaches it the the CV using CvXSUBANY(cv).any_ptr. You could add the three elements of method_cache_t directly into dbi_ima_t. Then, since the dispatch code already has the structure pointer (ima) handy, your cache code could avoid the mg_findext call and use ima-method_cache directly. Well, the original reason was that I wasn't aware of dbi_ima_t :-) However, on further reflection, I'm not sure that would work, as ima-method_cache would contain reference-counted SV pointers that would need to be freed when the CV is freed, and be duped when the CV is cloned in a new thread. But there is no specific mechanism to allow this to happen, so I'd have to mess with CLONE (and maintain a list of special CVs that need cloning), and attach magic that has a free() method to ensure that ima is freed when the CV is. Which is all beginning to sound rather complex. *nods* So I'm tempted to stick with the magic approach. I could optimise it to assume the cache magic is always the first one (and if not, take a slower route that finds and then moves it to the top). That would be neat. Thanks. dbih_getcom2 does the first part of that when looking up DBI hande magic. I never added the move to the top because I figured it would be very rare for magic to get added to the inner hash of a DBI handle. It seems even less likely that someone would add magic to a DBI method, but you never know. Unless you can think of a plausible case I'd be happy if you just added the short-cut+fallback and skipped the shuffle. That's also caused me to realise that my current code isn't threadsafe, as it it doesn't dup the cache entries. Easily fixed, though. Ah. I scratched my head for a while looking for threadafe issues and missed that one. I'm glad you found it. Thanks again Dave. Tim.
Re: speeding up XS_DBI_dispatch()
On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: then the CPU usage of the while loop broke down as follows: 7.0% overhead of loop, i.e. while() {$c++} 19.2% handle the outer method call, i.e. $sth-fetch() of which half is method lookup, half is assemble args ($sth) and pp_entersub 12.7% XS_DBI_dispatch() doing DBI stuff 20.5% XS_DBI_dispatch() doing method lookup and doing direct XS call to DBD::mysql::fetchrow_arrayref() 40.6% DBD::mysql::fetchrow_arrayref() fetching a row Given that my hacky cache for fetch() lookup reduced overall execution time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2% to 16.7%, i.e. nearly halves the dispatch time. I've now written some working caching code, that reduces CPU usage on while ($sth-fetch()) {$c++} from 15.23s to 13.10s, which is a 14% saving!! The test code just fetches millions of rows from a 2-int table and then doesn't do anything real with the returned data, so a real-world example will see a smaller saving, but I'm still very pleased with it :-) It's not ready for use yet; it passes the DBI test suite, but: * I haven't tested it under ithreads or early perls yet; * I haven't tried it against the DBD::mysql test suite yet; * I haven't written any tests yet for the method cache invalidation, e.g. if code does *DBD::mysql::st::foo = sub {...}, will subsequent calls to $sth-foo() call the new function? As regards that last point, can you recommend where I should place the new tests? Is there an existing test script that they can be added to, or if a new script needs adding, what should it be called, and is there a good existing script to use as a template? The code itself (see diff below) just attaches some magic to the outer CV that caches the GV of the corresponding inner method. It also remembers the inner stash and generation number, so that the cache can be invalidated if the outer method is called on another class. So for example, in the following pathological example: while ($sth_mysql-fetch) { $sth_oracle-fetch; ... } the cache in the magic attached to DBI::st::fetch will oscillate between pointing to *DBD::mysql::st:fetch and the similar oracle function, so no speedup will be seen. But normally, when there is a constant mapping between the outer and inner subs, the cache will speed things up. The caching code *does* assume that the name of the outer sub never changes, i.e. that GvNAME(CvGV(cv)) in XS_DBI_dispatch is constant WRT cv; is this a reasonable assumption? Otherwise, I would have to cache the sub name too, which will slow things down. The code makes use of non-API knowledge about PL_generation and HvMROMETA(stash)-cache_gen, which could conceivably change in a later perl release. PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. I've since been told that a) I'm allowed to say who's funding me: it's booking.com (thanks, guys!); and that b) depending on how much time I burn on the cache issue, I may have time to look at threaded issues too, specifically the dPERINTERP rewrite using MY_CXT - assuming that no one beats me to it! Dave. Index: DBI.xs === --- DBI.xs (revision 15099) +++ DBI.xs (working copy) @@ -82,6 +82,8 @@ static I32 dbi_hash _((const char *string, long i)); static void dbih_dumphandle _((pTHX_ SV *h, const char *msg, int level)); static int dbih_dumpcom _((pTHX_ imp_xxh_t *imp_xxh, const char *msg, int level)); +static int method_cache_free(pTHX_ SV* sv, MAGIC* mg); +static GV* inner_method_lookup(pTHX_ HV *stash, CV *cv, const char *meth_name); char *neatsvpv _((SV *sv, STRLEN maxlen)); SV * preparse(SV *dbh, const char *statement, IV ps_return, IV ps_accept, void *foo); @@ -161,6 +163,74 @@ /* 32 bit magic FNV-0 and FNV-1 prime */ #define FNV_32_PRIME ((UV)0x01000193) + + +/* ext magic attached to outer CV methods to quickly locate the + * corresponding inner method + */ + +static MGVTBL method_cache_vtbl = { 0, 0, 0, 0, method_cache_free, 0, 0, 0 }; + +typedef struct { +HV *stash; /* the stash we found the GV in */ +GV *gv; /* the GV containing the inner sub */ +U32 generation; /* cache invalidation */ +} method_cache_t; + +static int method_cache_free(pTHX_ SV* sv, MAGIC* mg) +{ +method_cache_t *c = (method_cache_t *)(mg-mg_ptr); +SvREFCNT_dec(c-stash); +SvREFCNT_dec(c-gv); +return 0; +} + +static GV* inner_method_lookup(pTHX_ HV *stash, CV *cv, const char *meth_name) +{ +GV *gv; +method_cache_t *c; +MAGIC *mg = mg_findext((SV*)cv, DBI_MAGIC, method_cache_vtbl); + +if (mg) { +if ( (c=(method_cache_t *)(mg-mg_ptr)) + c-stash == stash +
Re: speeding up XS_DBI_dispatch()
On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote: I've now written some working caching code, that reduces CPU usage on while ($sth-fetch()) {$c++} from 15.23s to 13.10s, which is a 14% saving!! The test code just fetches millions of rows from a 2-int table and then doesn't do anything real with the returned data, so a real-world example will see a smaller saving, but I'm still very pleased with it :-) Nice. It's not ready for use yet; it passes the DBI test suite, but: * I haven't tested it under ithreads or early perls yet; * I haven't tried it against the DBD::mysql test suite yet; * I haven't written any tests yet for the method cache invalidation, e.g. if code does *DBD::mysql::st::foo = sub {...}, will subsequent calls to $sth-foo() call the new function? As regards that last point, can you recommend where I should place the new tests? Is there an existing test script that they can be added to, or if a new script needs adding, what should it be called, and is there a good existing script to use as a template? The DBI test suite is, sadly, rather crufty. Lots of test files have far outgrown their original scope. I'd reccomend starting a new one, say 31methcache.t. You could use DBD::Sponge (as many tests do, e.g. 30subclass.t) and create two statement handles each preloaded with different data. Then fetchrow_arrayref from one and fetchrow_hashref from the other, for example, checking you get back the right data with the right kind of reference. Just a thought. The code itself (see diff below) just attaches some magic to the outer CV that caches the GV of the corresponding inner method. Any reason for not using the existing 'internal method attribute' mechanism? _install_method() already allocates a structure and attaches it the the CV using CvXSUBANY(cv).any_ptr. You could add the three elements of method_cache_t directly into dbi_ima_t. Then, since the dispatch code already has the structure pointer (ima) handy, your cache code could avoid the mg_findext call and use ima-method_cache directly. The caching code *does* assume that the name of the outer sub never changes, i.e. that GvNAME(CvGV(cv)) in XS_DBI_dispatch is constant WRT cv; is this a reasonable assumption? Yes, I think so. Though some kind of 'kill switch' to disable the cache might be handy, in case odd things are happening and the cache is suspected. The code makes use of non-API knowledge about PL_generation and HvMROMETA(stash)-cache_gen, which could conceivably change in a later perl release. Which seems like a good argument for extending the perl API to cover that kind of functionality. PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. I've since been told that a) I'm allowed to say who's funding me: it's booking.com (thanks, guys!); and that b) depending on how much time I burn on the cache issue, I may have time to look at threaded issues too, specifically the dPERINTERP rewrite using MY_CXT - assuming that no one beats me to it! That would be great. Thanks again Dave. Dave. Index: DBI.xs Looks good. Tim.
Re: speeding up XS_DBI_dispatch()
On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: The DBI is always allowed to cheat in the name of performance! My simple but effective xsbypass hack doesn't seem to have caused problems in the years it's been there (though it's a pity threaded perls can't use it). Just out of curiosity, why can't threaded perls use it? To be honest I can't remember now. It's disabled #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT) Can you think of a reason? All tests pass. Maybe I was just being conservative. Ah, that's USE_THREADS not USE_ITHREADS. Ancient history. I've no idea what PERL_CAPI is. There's no reference to it in current perl headers and only a mention in Changes5.6: Log: ensure XS_LOCKS stuff happens *before* XSUB is entered under -DPERL_CAPI So I'll regard that as ancient history as well. As for PERL_OBJECT, I see that perl58delta.pod says PERL_OBJECT has been completely removed. So that's yet more ancient history. So it seems that xsbypass is always enabled in 5.8+ builds. Tim.
Re: speeding up XS_DBI_dispatch()
On Fri, Jan 27, 2012 at 02:53:47PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: Just out of curiosity, why can't threaded perls use it? To be honest I can't remember now. It's disabled #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT) Can you think of a reason? All tests pass. Maybe I was just being conservative. Ah, that's USE_THREADS not USE_ITHREADS. Ancient history. Actually, USE_THREADS is normally also defined when USE_ITHREADS is defined (the original USE_THREADS was renamed USE_5005THREADS). I've no idea what PERL_CAPI is. There's no reference to it in current perl headers and only a mention in Changes5.6: Log: ensure XS_LOCKS stuff happens *before* XSUB is entered under -DPERL_CAPI So I'll regard that as ancient history as well. As for PERL_OBJECT, I see that perl58delta.pod says PERL_OBJECT has been completely removed. So that's yet more ancient history. So it seems that xsbypass is always enabled in 5.8+ builds. So, I don't think its enabled, but it sounds like it could be. -- The Enterprise's efficient long-range scanners detect a temporal vortex distortion in good time, allowing it to be safely avoided via a minor course correction. -- Things That Never Happen in Star Trek #21
Re: speeding up XS_DBI_dispatch()
On Fri, Jan 27, 2012 at 03:33:51PM +, Dave Mitchell wrote: On Fri, Jan 27, 2012 at 02:53:47PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: Just out of curiosity, why can't threaded perls use it? To be honest I can't remember now. It's disabled #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT) Can you think of a reason? All tests pass. Maybe I was just being conservative. Ah, that's USE_THREADS not USE_ITHREADS. Ancient history. Actually, USE_THREADS is normally also defined when USE_ITHREADS is defined (the original USE_THREADS was renamed USE_5005THREADS). Ah, ok. [...] So it seems that xsbypass is always enabled in 5.8+ builds. So, I don't think its enabled, but it sounds like it could be. It's now enabled by default and there's an (undocumented) env var to disable it if the need ever arises. Tim.
dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())
On Wed, Jan 25, 2012 at 05:37:13PM -0800, Jan Dubois wrote: On Wed, 25 Jan 2012, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. The dPERINTERP stuff was added by ActiveState years ago to support MULTIPLICITY. I don't remember the details now. I do recall that driver authors are encouraged to avoid using the DBIS macro due to the cost. There's rarely a need now as DBI handles carry a pointer to the dbi_state structure. The DBI::DBD docs say Sorry, only skimming the conversation, so maybe this is obvious to you: PERINTERP was a prototype from the 5.005 days for the MY_CXT stuff that is now in the code. If you look at 5.8.1, then you'll see MY_CXT being almost identical to PERINTERP in DBI. But switching to MY_CXT should give you performance benefits on later Perl versions, which have an improved MY_CXT implementation. Great. Thanks Jan. Anyone want to take a shot at that? Tim.
Re: speeding up XS_DBI_dispatch()
On Tue, Jan 24, 2012 at 08:42:28PM +, Tim Bunce wrote: On Mon, Jan 23, 2012 at 12:51:12AM +, Dave Mitchell wrote: In the particular case I've been investigating, the method lookup in XS_DBI_dispatch() contributed a measurable part of the total loop time; in particular, when as a test I added very crude and hacky caching for the lookup of the inner fetch method, the execution time for the loop dropped from 23.0s to 19.2s. I wonder what that is as a % change in the cost of dispatch. Using a localhost mysql server with a table with two int columns and 20M rows, and the following code on a non-threaded, -O2 perl 5.15.6: ... my $cmd = 'select SQL_NO_CACHE id, val from t3'; $dbh-{'mysql_use_result'}=1; my $sth = $dbh-prepare($cmd) or die; my ($id, $val); $sth-bind_columns(\$id, \$val); while ($sth-fetch) { $c++; } then the CPU usage of the while loop broke down as follows: 7.0% overhead of loop, i.e. while() {$c++} 19.2% handle the outer method call, i.e. $sth-fetch() of which half is method lookup, half is assemble args ($sth) and pp_entersub 12.7% XS_DBI_dispatch() doing DBI stuff 20.5% XS_DBI_dispatch() doing method lookup and doing direct XS call to DBD::mysql::fetchrow_arrayref() 40.6% DBD::mysql::fetchrow_arrayref() fetching a row Given that my hacky cache for fetch() lookup reduced overall execution time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2% to 16.7%, i.e. nearly halves the dispatch time. The guts of the DBI are ancient and baroque with many dusty corners were dragons may, or may not, be lurking. Oh dear :-) Just out of curiosity (since I'm in the middle of trying to understand the code in XS_DBI_dispatch), what is the main purpose of having all functions share a big dispatch fn which calls the inner method, rather than having a separate function for each? * To what extent (if any) is DBI allowed to cheat when it comes to bypassing the perl API (in particular I'm thinking of the method invalidation stuff which I don't think has always been part of the API). The DBI is always allowed to cheat in the name of performance! My simple but effective xsbypass hack doesn't seem to have caused problems in the years it's been there (though it's a pity threaded perls can't use it). Just out of curiosity, why can't threaded perls use it? PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. -- Lady Nancy Astor: If you were my husband, I would flavour your coffee with poison. Churchill: Madam - if I were your husband, I would drink it.
Re: speeding up XS_DBI_dispatch()
On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: On Tue, Jan 24, 2012 at 08:42:28PM +, Tim Bunce wrote: Given that my hacky cache for fetch() lookup reduced overall execution time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2% to 16.7%, i.e. nearly halves the dispatch time. Nice. The guts of the DBI are ancient and baroque with many dusty corners were dragons may, or may not, be lurking. Oh dear :-) Just out of curiosity (since I'm in the middle of trying to understand the code in XS_DBI_dispatch), what is the main purpose of having all functions share a big dispatch fn which calls the inner method, rather than having a separate function for each? Almost all the functionality in the dispatcher (tracing, RaiseError, Callbacks, Profiling etc.) would need to be duplicated in each method. * To what extent (if any) is DBI allowed to cheat when it comes to bypassing the perl API (in particular I'm thinking of the method invalidation stuff which I don't think has always been part of the API). The DBI is always allowed to cheat in the name of performance! My simple but effective xsbypass hack doesn't seem to have caused problems in the years it's been there (though it's a pity threaded perls can't use it). Just out of curiosity, why can't threaded perls use it? To be honest I can't remember now. It's disabled #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT) Can you think of a reason? All tests pass. Maybe I was just being conservative. PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. The dPERINTERP stuff was added by ActiveState years ago to support MULTIPLICITY. I don't remember the details now. I do recall that driver authors are encouraged to avoid using the DBIS macro due to the cost. There's rarely a need now as DBI handles carry a pointer to the dbi_state structure. The DBI::DBD docs say The names Cdbis and CDBIS, which were used in previous versions of this document, should be replaced with the CDBIc_STATE(imp_xxh) macro. Hopefully most compiled drivers are following that advice now. Ah, actually that's wrong! It should say DBIc_DBISTATE(imp_xxh). Fixed in r15098. [goes to look at the DBI.xs sources] Oh, wow. There are more dPERINTERP's left than I thought. I've just eliminated a bunch of them in r15099. Several were in hot paths, including fetchrow_arrayref and handle creation and destruction. That's a useful win for threaded perl. Thanks Dave! Tim.
RE: speeding up XS_DBI_dispatch()
On Wed, 25 Jan 2012, Tim Bunce wrote: On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote: PS - I'm specifically being paid only to fix a performance issue on non-threaded builds, so I won't be looking at threaded issues. But dPERINTERP looks like bad news on threaded builds. The dPERINTERP stuff was added by ActiveState years ago to support MULTIPLICITY. I don't remember the details now. I do recall that driver authors are encouraged to avoid using the DBIS macro due to the cost. There's rarely a need now as DBI handles carry a pointer to the dbi_state structure. The DBI::DBD docs say Sorry, only skimming the conversation, so maybe this is obvious to you: PERINTERP was a prototype from the 5.005 days for the MY_CXT stuff that is now in the code. If you look at 5.8.1, then you'll see MY_CXT being almost identical to PERINTERP in DBI. But switching to MY_CXT should give you performance benefits on later Perl versions, which have an improved MY_CXT implementation. Cheers, -Jan
speeding up XS_DBI_dispatch()
[ disclaimer: I am being funded to do this work ] I've been looking into bottlenecks in a tight fetch loop with millions of rows, e.g. $sth-bind_columns(...); while ($sth-fetch) { # do a small amount of work } In the particular case I've been investigating, the method lookup in XS_DBI_dispatch() contributed a measurable part of the total loop time; in particular, when as a test I added very crude and hacky caching for the lookup of the inner fetch method, the execution time for the loop dropped from 23.0s to 19.2s. So, I would like to attempt to add a method cache to XS_DBI_dispatch() (and possibly see if anything else can be tweaked). My questions are: * Would such a patch be welcome in principle? * Any gotchas that I might not be aware of (I'm not familiar with DBI/DBD internals apart from what I've played with for the last few days). * What's the earliest perl we have to maintain compatibility with? * Does the test suite already have good coverage for method lookup (especially where methods are redefined or stashes modified, i.e. the sort of thing that would invalidate caches), or should I add some? * can I rely on _install_method() being the only way that XS CVs can get to point to XS_DBI_dispatch()? * To what extent (if any) is DBI allowed to cheat when it comes to bypassing the perl API (in particular I'm thinking of the method invalidation stuff which I don't think has always been part of the API). Thanks, Dave -- Please note that ash-trays are provided for the use of smokers, whereas the floor is provided for the use of all patrons. -- Bill Royston