subject:"speeding up XS_DBI

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-17 Thread Dave Mitchell

On Fri, Feb 10, 2012 at 10:03:06AM -0800, Jan Dubois wrote:
 On Fri, 10 Feb 2012, Tim Bunce wrote:
  On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote:
  
   If anyone knows of a more elegant way to make a function from DBI.xs
   available to DBD:: code, please let me know!
  
  I hope to take a proper look in the next day or so.
 
 The standard way nowadays seems to be to use ExtUtils::Depends. I'm
 not convinced though that it is any more elegant than the method you are
 already using.
 
 An example module using ExtUtils::Depends is B::Hooks::OP::Check.
 
 It is important to also list all exported functions in the FUNCLIST
 parameter to MakeMaker; otherwise the symbols will not be exported on
 AIX and Windows.

Is this necessary for my patch? I assumed that just by treating my
function as XS (i.e. by calling newXS), something magical would happen
somehow. After all, none of the other XS functions need a special mention
in Makefile.PL?

-- 
The crew of the Enterprise encounter an alien life form which is
surprisingly neither humanoid nor made from pure energy.
-- Things That Never Happen in Star Trek #22

RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-17 Thread Jan Dubois

On Fri, 17 Feb 2012, Dave Mitchell wrote:
 On Fri, Feb 10, 2012 at 10:03:06AM -0800, Jan Dubois wrote:
 On Fri, 10 Feb 2012, Tim Bunce wrote:
 On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote:
 If anyone knows of a more elegant way to make a function from
 DBI.xs available to DBD:: code, please let me know!

 I hope to take a proper look in the next day or so.

 The standard way nowadays seems to be to use ExtUtils::Depends. I'm
 not convinced though that it is any more elegant than the method you
 are already using.

 An example module using ExtUtils::Depends is B::Hooks::OP::Check.

 It is important to also list all exported functions in the FUNCLIST
 parameter to MakeMaker; otherwise the symbols will not be exported on
 AIX and Windows.

 Is this necessary for my patch?

No, it isn't.

 I assumed that just by treating my function as XS (i.e. by calling
 newXS), something magical would happen somehow. After all, none of the
 other XS functions need a special mention in Makefile.PL?

None of the XS functions are actually exposed to the linker, except for
the boot_* symbol, which is already treated specially by MakeMaker.

All other XS symbols are only accessed via the CV and not via the
linker/loader symbol tables.

Adding to FUNCLIST is necessary if you want to export the symbol
at the C level and have it resolved by the OS loader.

Cheers,
-Jan

undefined symbol: mro_meta_init on 5.10.0 (was: speeding up XS_DBI_dispatch())

2012-02-13 Thread Tim Bunce

On Sun, Feb 05, 2012 at 09:04:23PM +, Dave Mitchell wrote:
 
 Anyway, attached is the final patch.
 I've tested it under 5.8.1, 5.8.9, 5.14.2 and 5.15.7, with various
 permutations of threaded builds.

I've just noticed that DBI 1.617_901 has been failing on perl 5.10.0
http://matrix.cpantesters.org/?dist=DBI+1.617_901

The issue seems to be undefined symbol: mro_meta_init e.g.:
http://www.cpantesters.org/cpan/report/ba083076-540a-11e1-a48f-e7fb434ae6f1

Any ideas?

Tim.

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-13 Thread Tim Bunce

On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote:
 
 I'm assuming that these won't be applied until after the windows() fork
 issue is fixed, so think of this email more as a preview.
 
 The first fix is relatively straightforward, and is private to DBI.xs,
 since the PERINTERP macros were kind of a prototype for MY_CXT anyway.

Since the first was straightforward, and the windows fork issue has been
resolved (thanks Jan!) I've applied the first patch and uploaded a trial
release for cpantesters to chew on.

Thanks Dave.

I'll look more closely at the second this week. I won't apply it till
we've discussed any issues (which are most likely to be around how to
minimize the pain of synchronized updates of DBI and drivers). It might
even make sense to support both old and new mechanisms for a while.

There's also the mro_meta_init issue.

Tim.

Re: speeding up XS_DBI_dispatch()

2012-02-11 Thread Martin J. Evans


On 11/02/2012 00:48, Jan Dubois wrote:

On Fri, 10 Feb 2012, Dave Mitchell wrote:

On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote:

Well perls fork() relies on threaded  perl so it could very easily be
Dave's patch. Dave do you have access to a win32 build environment?

I'm afraid not.

The problem is due to the getenv() call during interpreter cloning.
On Windows Perl keeps a virtual host environment for each interpreter,
with its own cwd, and its own set of environment variables.  I don't
have time to figure out _why_ this is an issue, but maybe you can check
the environment variable during the BOOT section and just set a global
variable instead, to work around this issue?

 if (getenv(PERL_DBI_XSBYPASS))
 use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS));

On my old Windows 2000 system the test would simply abort with the
message abnormal program termination.  I was just guessing that
this is somehow triggered by the abort() function in the C runtime.
Setting a breakpoint on it confirms it, even though the stack trace
looks incomplete (I don't see PerlEnvGetenv() calling abort() directly):

MSVCRT!abort
perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462]
DBI!dbi_bootinit+0x276 [DBI.xs @ 468]
DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280]
perl514!Perl_pp_entersub+0x701 [..\pp_hot.c @ 3049]
perl514!Perl_runops_standard+0xc [..\run.c @ 41]
perl514!Perl_call_sv+0x195 [perl.c @ 2633]
perl514!perl_clone_using+0x18ad [..\sv.c @ 13422]
perl514!PerlProcFork+0x89 [perlhost.h @ 1854]
perl514!Perl_pp_fork+0x42 [..\pp_sys.c @ 4044]
perl514!Perl_runops_standard+0xc [..\run.c @ 41]
perl514!S_run_body+0xf5 [perl.c @ 2351]
perl514!perl_run+0x179 [perl.c @ 2269]
perl514!RunPerl+0xed [perllib.c @ 270]
perl!main+0x12 [perlmain.c @ 22]
perl!mainCRTStartup+0xe3
KERNEL32!BaseProcessStart+0x3d

Anyways, I would suggest trying a workaround by moving the getenv()
call outside the cloning operation.

Cheers,
-Jan



Nice find Jan. Commenting out the getenv works for me.

Martin

Re: speeding up XS_DBI_dispatch()

2012-02-11 Thread Tim Bunce

On Fri, Feb 10, 2012 at 04:48:38PM -0800, Jan Dubois wrote:
 On Fri, 10 Feb 2012, Dave Mitchell wrote:
  On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote:
   Well perls fork() relies on threaded  perl so it could very easily be
   Dave's patch. Dave do you have access to a win32 build environment?
  
  I'm afraid not.
 
 The problem is due to the getenv() call during interpreter cloning.
 On Windows Perl keeps a virtual host environment for each interpreter,
 with its own cwd, and its own set of environment variables.  I don't
 have time to figure out _why_ this is an issue, but maybe you can check
 the environment variable during the BOOT section and just set a global
 variable instead, to work around this issue?

When called from BOOT dbi_bootinit() gets passed a NULL parent_dbis
parameter, but when called from CLONE it's not null, so this might be a
good fix:

@@ -579,7 +579,10 @@
 gv_fetchpv(DBI::lasth,  GV_ADDMULTI, SVt_PV);
 gv_fetchpv(DBI::rows,   GV_ADDMULTI, SVt_PV);
 
-if (getenv(PERL_DBI_XSBYPASS))
+/* we only need to check the env var on the initial boot
+ * which is handy because it can core dump during CLONE on windows
+ */
+if (!parent_dbis  getenv(PERL_DBI_XSBYPASS))
 use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS));
 }
 
Martin, could you try that?

 Setting a breakpoint on it confirms it, even though the stack trace
 looks incomplete (I don't see PerlEnvGetenv() calling abort() directly):
 
 MSVCRT!abort
 perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462]
 DBI!dbi_bootinit+0x276 [DBI.xs @ 468]
 DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280]
 
 Anyways, I would suggest trying a workaround by moving the getenv()
 call outside the cloning operation.

Hopefully the above will do the trick. Meanwhile, aborting if getenv is
called during a CLONE sure seems like a bug worthy of a ticket for p5p.

Thanks Jan!

Tim.

Re: speeding up XS_DBI_dispatch()

2012-02-11 Thread demerphq

On 11 February 2012 14:55, Tim Bunce tim.bu...@pobox.com wrote:
 On Fri, Feb 10, 2012 at 04:48:38PM -0800, Jan Dubois wrote:
 MSVCRT!abort
 perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462]
 DBI!dbi_bootinit+0x276 [DBI.xs @ 468]
 DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280]

 Anyways, I would suggest trying a workaround by moving the getenv()
 call outside the cloning operation.

 Hopefully the above will do the trick. Meanwhile, aborting if getenv is
 called during a CLONE sure seems like a bug worthy of a ticket for p5p.

I am not so sure. During a clone, the internals are potentially in an
indeterminate state. I could see it being argued that the only thing
you should be doing during a clone is cloning, not environment
lookups.

But of course, if we can fix it that would be better. And the above is
pure speculation.

Yves

-- 
perl -Mre=debug -e /just|another|perl|hacker/

Re: speeding up XS_DBI_dispatch()

2012-02-10 Thread Dave Mitchell

On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote:
 Well perls fork() relies on threaded  perl so it could very easily be
 Dave's patch. Dave do you have access to a win32 build environment?

I'm afraid not.

-- 
All wight. I will give you one more chance. This time, I want to hear
no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.
-- Life of Brian

Re: speeding up XS_DBI_dispatch()

2012-02-10 Thread Martin J. Evans


On 10/02/12 10:01, Dave Mitchell wrote:

On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote:

Well perls fork() relies on threaded  perl so it could very easily be
Dave's patch. Dave do you have access to a win32 build environment?


I'm afraid not.



Have you any suggestion on how I could help debug this?

I do have A windows machine which exhibits the problem.

Martin
--
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

Re: speeding up XS_DBI_dispatch()

2012-02-10 Thread Martin J. Evans


On 10/02/12 10:54, Tim Bunce wrote:

On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote:

Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on 
Windows.


Ah, 1.616_901 and 1.616_902! So unrelated to Dave's method cache work.

See if this works:

 - static int use_xsbypass = 1;
 + static int use_xsbypass = 0;


Think I already tried that last night by setting the env var and it made no 
difference.

Don't have the machine to hand right now although I could try and set it up 
here.
 

If not, then it's probably something to do with the dPERINTERP changes.


which is my suspicion.


Running with a high trace level might shed some light but a strack trace
of some kind would be ideal.


I tried tracing but nothing stood out - I could post the end of the trace later.
 

Tim.


Martin
--
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-10 Thread Dave Mitchell

On Thu, Jan 26, 2012 at 12:26:44PM +, Tim Bunce wrote:
 On Wed, Jan 25, 2012 at 05:37:13PM -0800, Jan Dubois wrote:
  On Wed, 25 Jan 2012, Tim Bunce wrote:
   On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
   
PS - I'm specifically being paid only to fix a performance issue on
non-threaded builds, so I won't be looking at threaded issues. But
dPERINTERP looks like bad news on threaded builds.
   
   The dPERINTERP stuff was added by ActiveState years ago to support
   MULTIPLICITY. I don't remember the details now. I do recall that driver
   authors are encouraged to avoid using the DBIS macro due to the cost.
   There's rarely a need now as DBI handles carry a pointer to the
   dbi_state structure. The DBI::DBD docs say
  
  Sorry, only skimming the conversation, so maybe this is obvious to
  you: PERINTERP was a prototype from the 5.005 days for the MY_CXT
  stuff that is now in the code. If you look at 5.8.1, then you'll see
  MY_CXT being almost identical to PERINTERP in DBI. But switching to
  MY_CXT should give you performance benefits on later Perl versions,
  which have an improved MY_CXT implementation.
 
 Great. Thanks Jan.
 
 Anyone want to take a shot at that?

I attach two patches; the first replaces dPERINTERP with dMY_THX,
while the second extends this to the DBIS stuff too.

The headline figure (YMMV etc) is that on a recent threaded perl, these
two patches collectively make my mysql empty test loop twice as fast!!! :

while ($sth-fetch) { $c++ }

On a perl  5.10.0 (before dMY_THX was made much more efficient), the
speedup is less spectacular, but is still about 25%.

I'm assuming that these won't be applied until after the windows() fork
issue is fixed, so think of this email more as a preview.

The first fix is relatively straightforward, and is private to DBI.xs,
since the PERINTERP macros were kind of a prototype for MY_CXT anyway.

The second patch, which makes DBIS more efficient, is a lot more complex,
and more likely to break things (especially as it's changing a bunch of
macros that are directly #included by the DBD drivers. You may need to
bump API version numbers; I don't understand that bit.

It works by adding a C function, _dbi_state_lval(), to DBI.xs which
returns a pointer to the static(ish) dbistate struct that only DBI.xs
knows about.  However, I couldn't find any way for DBD:: code to call
functions within DBI, so I did a little hack: I faked up an XS sub,
DBI::_dbi_state_lval(), whose CvXSUB points to the _dbi_state_lval
function. Since _dbi_state_lval() isn't actually an XS function,
perl-level code that tries to call DBI::_dbi_state_lval() will crash and
burn.

If anyone knows of a more elegant way to make a function from DBI.xs
available to DBD:: code, please let me know!

Anyway, at the DBD end of things, the code extracts the address of the
function from DBI::_dbi_state_lval's CvXSUB slot, and caches it in
a static var. Then in threaded builds, DBIS expands to a call to a static
function that calls  _dbi_state_lval() which returns (MY_CXT.dbi_state).
In unthreaded builds, it just returns the value of a static var as normal.

Here are some timings; remember that MY_CXT was made a lot faster in
5.10.0, so I've included timings for 5.8.9 too. As you'd expect, only the
threaded build have a significant speed-up; I think the tiny speed-ups in
the non-threaded builds are just noise.

The three timings (in sec) for the basic while($sth-fetch){$c++} loop
are for:

(1) the baseline: r15128 plus my method cache code
(2) in addition, replace dPERINTERP with dMY_CXT
(3) in addition, fix DBIS

1  2  3
- -- --
40.10  -  29.965.8.9threaded, optimised
37.62  33.98  18.965.15.7   threaded, optimised

12.85  -  12.555.8.9  unthreaded, optimised
13.41  13.46  12.975.15.6 unthreaded, optimised


The big saving between (2) and (3) is due to DBD::mysql still using DBIS;
in particular, for every fetch call.

-- 
This email is confidential, and now that you have read it you are legally
obliged to shoot yourself. Or shoot a lawyer, if you prefer. If you have
received this email in error, place it in its original wrapping and return
for a full refund. By opening this email, you accept that Elvis lives.
diff --git a/DBI.xs b/DBI.xs
index ac161f4..887111d 100644
--- a/DBI.xs
+++ b/DBI.xs
@@ -16,8 +16,6 @@
 #include sys/timeb.h
 # endif
 
-#define MY_VERSION DBI( XS_VERSION )
-
 /* The XS dispatcher code can optimize calls to XS driver methods,
  * bypassing the usual call_sv() and argument handling overheads.
  * Just-in-case it causes problems there's an (undocumented) way
@@ -277,40 +275,24 @@ static GV* inner_method_lookup(pTHX_ HV *stash, CV *cv, 
const char *meth_name)
 
 
 /* --- make DBI safe for multiple perl interpreters --- */
-/* Contributed by Murray Nesbitt of ActiveState */
-/* (This pre-dates, and should be replaced by, MY_CTX)  */
+/* Originally contributed by Murray Nesbitt of

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-10 Thread Tim Bunce

On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote:
 On Thu, Jan 26, 2012 at 12:26:44PM +, Tim Bunce wrote:
 
 I attach two patches; the first replaces dPERINTERP with dMY_THX,
 while the second extends this to the DBIS stuff too.
 
 The headline figure (YMMV etc) is that on a recent threaded perl, these
 two patches collectively make my mysql empty test loop twice as fast!!! :
 
 while ($sth-fetch) { $c++ }
 
 On a perl  5.10.0 (before dMY_THX was made much more efficient), the
 speedup is less spectacular, but is still about 25%.

That's great news!

 I'm assuming that these won't be applied until after the windows() fork
 issue is fixed, so think of this email more as a preview.

*nods*

 [...]
 
 If anyone knows of a more elegant way to make a function from DBI.xs
 available to DBD:: code, please let me know!

I hope to take a proper look in the next day or so.

 1  2  3
 - -- --
 40.10  -  29.965.8.9threaded, optimised
 37.62  33.98  18.965.15.7   threaded, optimised
 
 12.85  -  12.555.8.9  unthreaded, optimised
 13.41  13.46  12.975.15.6 unthreaded, optimised

Most excellent!

 The big saving between (2) and (3) is due to DBD::mysql still using DBIS;
 in particular, for every fetch call.

That should be a good reminder to driver authors to avoid using DBIS!

Thanks again Dave.

Tim.

RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-02-10 Thread Jan Dubois

On Fri, 10 Feb 2012, Tim Bunce wrote:
 On Fri, Feb 10, 2012 at 03:27:24PM +, Dave Mitchell wrote:
 
  If anyone knows of a more elegant way to make a function from DBI.xs
  available to DBD:: code, please let me know!
 
 I hope to take a proper look in the next day or so.

The standard way nowadays seems to be to use ExtUtils::Depends. I'm
not convinced though that it is any more elegant than the method you are
already using.

An example module using ExtUtils::Depends is B::Hooks::OP::Check.

It is important to also list all exported functions in the FUNCLIST
parameter to MakeMaker; otherwise the symbols will not be exported on
AIX and Windows.

Personally I've just added names to FUNCLIST, and then called
dlopen/dlsym or LoadLibrary/GetProcAddress on the embedding side (not
actually another module, but an app hosting a Perl interpreter).

Ideally you should be able to do this with the DynaLoader dl_load_file()
and dl_find_symbol() functions, but I found some claims (in the FFI
module docs) that this has problems on Windows. I don't know what those
problems are, but it looks like dl_load_file() on Windows will return a
handle to the current executable when the requested library cannot be
found. I have no clue why it is doing that.

So, in summary, I think the current approach isn't such a bad solution
at all. :)
Cheers,
-Jan

RE: speeding up XS_DBI_dispatch()

2012-02-10 Thread Jan Dubois

On Fri, 10 Feb 2012, Dave Mitchell wrote:
 On Thu, Feb 09, 2012 at 11:10:17PM +0100, demerphq wrote:
  Well perls fork() relies on threaded  perl so it could very easily be
  Dave's patch. Dave do you have access to a win32 build environment?
 
 I'm afraid not.

The problem is due to the getenv() call during interpreter cloning.
On Windows Perl keeps a virtual host environment for each interpreter,
with its own cwd, and its own set of environment variables.  I don't
have time to figure out _why_ this is an issue, but maybe you can check
the environment variable during the BOOT section and just set a global
variable instead, to work around this issue?

if (getenv(PERL_DBI_XSBYPASS))
use_xsbypass = atoi(getenv(PERL_DBI_XSBYPASS));

On my old Windows 2000 system the test would simply abort with the
message abnormal program termination.  I was just guessing that
this is somehow triggered by the abort() function in the C runtime.
Setting a breakpoint on it confirms it, even though the stack trace
looks incomplete (I don't see PerlEnvGetenv() calling abort() directly):

MSVCRT!abort
perl514!PerlEnvGetenv+0x13 [perlhost.h @ 462]
DBI!dbi_bootinit+0x276 [DBI.xs @ 468]
DBI!XS_DBI__clone_dbis+0x71 [DBI.c @ 4280]
perl514!Perl_pp_entersub+0x701 [..\pp_hot.c @ 3049]
perl514!Perl_runops_standard+0xc [..\run.c @ 41]
perl514!Perl_call_sv+0x195 [perl.c @ 2633]
perl514!perl_clone_using+0x18ad [..\sv.c @ 13422]
perl514!PerlProcFork+0x89 [perlhost.h @ 1854]
perl514!Perl_pp_fork+0x42 [..\pp_sys.c @ 4044]
perl514!Perl_runops_standard+0xc [..\run.c @ 41]
perl514!S_run_body+0xf5 [perl.c @ 2351]
perl514!perl_run+0x179 [perl.c @ 2269]
perl514!RunPerl+0xed [perllib.c @ 270]
perl!main+0x12 [perlmain.c @ 22]
perl!mainCRTStartup+0xe3
KERNEL32!BaseProcessStart+0x3d

Anyways, I would suggest trying a workaround by moving the getenv()
call outside the cloning operation.

Cheers,
-Jan

Re: speeding up XS_DBI_dispatch()

2012-02-09 Thread demerphq

On 9 February 2012 22:53, Martin J. Evans martin.ev...@easysoft.com wrote:
 Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI
 on Windows. The test 16destroy.t fails with a Windows error

 This application has requested the Runtime to terminate in an unusual way

 Initially I saw this in 1.617 but then I stepped backwards to find the
 problem occurred between these 2 versions. Some smokers have reported this
 problem in 5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook
 which is strawberry perl 5.12.0.

 As you'll see from the Changes file the only 2 entries added were:

   Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce)

Well perls fork() relies on threaded  perl so it could very easily be
Dave's patch. Dave do you have access to a win32 build environment?

   Added note to DBI::Profile about async queries (Marcel GrFCnauer).

 I've diffed the two versions but there were quite a lot of changes and so
 far I cannot locate the problem. I've given up for tonight but might find
 time to do more.

 I have a vastly reduced 16destroy.t which still demonstrates the problem if
 it helps anyone.

Is it possible to do something like git-bisect here?

Yves


-- 
perl -Mre=debug -e /just|another|perl|hacker/

Re: speeding up XS_DBI_dispatch()

2012-02-09 Thread Tim Bunce

On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote:
 Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI 
 on Windows. The test 16destroy.t fails with a Windows error
 
 This application has requested the Runtime to terminate in an unusual way

I'd guess that's Windows polite version of Seg Fault ot Bus Error :)
I know ~0 about windows but I seem to recall that there _might_ be
something useful in the system event logs.

 Initially I saw this in 1.617 but then I stepped backwards to find the 
 problem occurred between these 2 versions. Some smokers have reported this 
 problem in 5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook 
 which is strawberry perl 5.12.0.
 
 As you'll see from the Changes file the only 2 entries added were:
 
Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce)
Added note to DBI::Profile about async queries (Marcel GrFCnauer).
 
 I've diffed the two versions but there were quite a lot of changes and
 so far I cannot locate the problem. I've given up for tonight but
 might find time to do more.

When you get a moment could you try this change:

--- DBI.xs  (revision 15137)
+++ DBI.xs  (working copy)
@@ -3536,7 +3536,7 @@
 }
 }
 
-if (is_orig_method_name)
+if (0  is_orig_method_name)
 imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh),
 cv, meth_name);
 else

which will, I think, effectively disable the runtime side of the method
cache changes.

Tim.

Re: speeding up XS_DBI_dispatch()

2012-02-09 Thread Martin J. Evans


On 09/02/2012 22:12, Tim Bunce wrote:

On Thu, Feb 09, 2012 at 09:53:08PM +, Martin J. Evans wrote:

Something between 1.616_901 and 1.616_902 changed which breaks fork() in DBI on 
Windows. The test 16destroy.t fails with a Windows error

This application has requested the Runtime to terminate in an unusual way

I'd guess that's Windows polite version of Seg Fault ot Bus Error :)
I know ~0 about windows but I seem to recall that there _might_ be
something useful in the system event logs.

I'll look more but no luck so far.


Initially I saw this in 1.617 but then I stepped backwards to find the problem 
occurred between these 2 versions. Some smokers have reported this problem in 
5.10.1, 5.14.0 and 5.14.2. I can reproduce on my wife's netbook which is 
strawberry perl 5.12.0.

As you'll see from the Changes file the only 2 entries added were:

Enhanced performance for threaded perls (Dave Mitchell, Tim Bunce)
Added note to DBI::Profile about async queries (Marcel GrFCnauer).

I've diffed the two versions but there were quite a lot of changes and
so far I cannot locate the problem. I've given up for tonight but
might find time to do more.

When you get a moment could you try this change:

--- DBI.xs  (revision 15137)
+++ DBI.xs  (working copy)
@@ -3536,7 +3536,7 @@
  }
  }

-if (is_orig_method_name)
+if (0  is_orig_method_name)
  imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh),
  cv, meth_name);
  else

which will, I think, effectively disable the runtime side of the method
cache changes.

Tim.

nope, didn't change anything - it still failed.

Martin

Re: speeding up XS_DBI_dispatch()

2012-02-08 Thread Dave Mitchell

On Wed, Feb 08, 2012 at 10:58:37AM +, Tim Bunce wrote:
  PS - I'm currently working on bringing PERINTERP and DBIS into the
  Brave New World of MY_CXT.
 
 Wonderful!
 
 I presume that'll cause an API change that'll require compiled drivers
 to be recompiled (ie. we'd bump DBISTATE_VERSION so check_version() will
 croak).  It's been many years since a DBI upgrade has required that
 (1998 as far as I can tell from the Changes file).

I dunno. The thing that is changing from DBD's code perspective is some of
the macros; in particular, on threaded builds, DBIS_PUBLISHED_LVALUE is
now defined(*) as:

#define DBIS_PUBLISHED_LVALUE (*(dbi_state_lval(aTHX)))

where dbi_state_lval() is a new function.

When I tried doing a 'make test' of DBI against a perl with an existing
DBI and DBD::mysql installed, one of the first DBI tests (01basics.t?
02dbidrv.t?) chucked out some mysql errors, so I just deleted DBD::mysql
from the perl. Perhaps that means the version does indeed need bumping?
Anyway, you'll have a better idea once I submit the patch - I'll leave any
bumping to you.

(*) well, currently, anyway.

-- 
My Dad used to say 'always fight fire with fire', which is probably why
he got thrown out of the fire brigade.

Re: speeding up XS_DBI_dispatch()

2012-02-07 Thread Tim Bunce

On Sun, Feb 05, 2012 at 09:04:23PM +, Dave Mitchell wrote:
 On Mon, Jan 30, 2012 at 10:56:37PM +, Tim Bunce wrote:
  
  dbih_getcom2 does the first part of that when looking up DBI hande
  magic. I never added the move to the top because I figured it would be
  very rare for magic to get added to the inner hash of a DBI handle.
  It seems even less likely that someone would add magic to a DBI method,
  but you never know. Unless you can think of a plausible case I'd be
  happy if you just added the short-cut+fallback and skipped the shuffle.
 
 I'd already written the magic shuffler code by the time I saw your email,
 so unless you want me to rip it out, it's your's fro free!

:)

 Anyway, attached is the final patch.
 I've tested it under 5.8.1, 5.8.9, 5.14.2 and 5.15.7, with various
 permutations of threaded builds.
 
 It took quite a bit more work from the initial draft I showed you, mainly
 getting the threaded stuff right, and a stupid PL_generation /
 PL_sub_generation mix-up. I knew what the difference was between the two
 vars, but whenever I visually inspected the code, I kept seeing the former
 as the latter and couldn't understand why cache invalidation didn't work!

 -imp_msv = (SV*)gv_fetchmethod_autoload(DBIc_IMP_STASH(imp_xxh), 
 meth_name, FALSE);
 +if (is_orig_method_name)
 +imp_msv = (SV*)inner_method_lookup(aTHX_ DBIc_IMP_STASH(imp_xxh),
 +cv, meth_name);
 +else
 +imp_msv = (SV*)gv_fetchmethod_autoload(DBIc_IMP_STASH(imp_xxh),
 +meth_name, FALSE);
  
  /* if method was a 'func' then try falling back to real 'func' 
 method */
  if (!imp_msv  (ima_flags  IMA_FUNC_REDIRECT)) {
 @@ -3463,7 +3582,7 @@
  
 -if (!imp_msv) {
 +if (!imp_msv || !GvCV(imp_msv)) {

What's the GvCV test for?

 +++ t/31methcache.t   (revision 0)

The tests look great. Thanks.

 +BEGIN { eval use threads; }# Must be first
 +my $use_threads_err = $@;

 +# With this test code and threads, 5.8.1 has issues with freeing freed
 +# scalars, while 5.8.9 doesn't; I don't know about in-between - DAPM

issues? i.e., warnings at global destruction?

The Mac OS X perl 5.8.8 with threads that I have handy works ok.

 +my $has_threads = $Config{useithreads}  $] = 5.008009;
 +die $use_threads_err if $has_threads  $use_threads_err;

Umm.

I'm tempted to disable the cache at compile time for perl  5.8.8.

Meanwhile, I've applied the patch and uploaded a trial release
for cpantesters to chew on. Once there are sufficient reports from old
perls that have threading enabled I'll scan them for warnings.

Thanks again Dave!

Tim.

Re: speeding up XS_DBI_dispatch()

2012-02-07 Thread Dave Mitchell

On Tue, Feb 07, 2012 at 10:54:12PM +, Tim Bunce wrote:
  -if (!imp_msv) {
  +if (!imp_msv || !GvCV(imp_msv)) {
 
 What's the GvCV test for?

One of the more vicious tests in t/31methcache.t does

local $DBD::Sponge::st::{fetch};

which causes gv_fetchmeth(...,'fetch') to return a GV that has a null CV
slot (IIRC).

  +# With this test code and threads, 5.8.1 has issues with freeing freed
  +# scalars, while 5.8.9 doesn't; I don't know about in-between - DAPM
 
 issues? i.e., warnings at global destruction?

Yes, similar to those that t/35thrclone.t generates; i.e. I assume that
it's a generic issue with ithreads in early 5.8.x releases (I remember
fixing a lot of such issues, and Nicholas backporting them into
maint-5.8). So I decided to just skip the threading-related cache tests
for early 5.8.x's.

 The Mac OS X perl 5.8.8 with threads that I have handy works ok.
 
  +my $has_threads = $Config{useithreads}  $] = 5.008009;
  +die $use_threads_err if $has_threads  $use_threads_err;
 
 Umm.
 
 I'm tempted to disable the cache at compile time for perl  5.8.8.

I don't think the caching code is bringing up any new threading issues
with early 5.8.x's; i.e. the new test file t/31methcache gives the same
pattern of 'Attempt to free unreferenced scalar' warnings with or without
the cache code.

 Meanwhile, I've applied the patch and uploaded a trial release
 for cpantesters to chew on. Once there are sufficient reports from old
 perls that have threading enabled I'll scan them for warnings.

Ok thanks.

PS - I'm currently working on bringing PERINTERP and DBIS into the
Brave New World of MY_CXT.

-- 
Hofstadter's Law: It always takes longer than you expect, even when you
take into account Hofstadter's Law.

Re: speeding up XS_DBI_dispatch()

2012-01-30 Thread Dave Mitchell

On Sun, Jan 29, 2012 at 10:06:58PM +, Tim Bunce wrote:
 On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote:
  The code itself (see diff below) just attaches some magic to the outer CV
  that caches the GV of the corresponding inner method.
 
 Any reason for not using the existing 'internal method attribute'
 mechanism? _install_method() already allocates a structure and
 attaches it the the CV using CvXSUBANY(cv).any_ptr.  You could add the
 three elements of method_cache_t directly into dbi_ima_t.  Then, since
 the dispatch code already has the structure pointer (ima) handy, your
 cache code could avoid the mg_findext call and use ima-method_cache
 directly.

Well, the original reason was that I wasn't aware of dbi_ima_t :-)

However, on further reflection, I'm not sure that would work, as
ima-method_cache would contain reference-counted SV pointers that would
need to be freed when the CV is freed, and be duped when the CV is cloned
in a new thread.  But there is no specific mechanism to allow this to
happen, so I'd have to mess with CLONE (and maintain a list of special CVs
that need cloning), and attach magic that has a free() method to ensure
that ima is freed when the CV is. Which is all beginning to sound rather
complex. So I'm tempted to stick with the magic approach. I could optimise
it to assume the cache magic is always the first one (and if not, take a
slower route that finds and then moves it to the top).

That's also caused me to realise that my current code isn't threadsafe, as
it it doesn't dup the cache entries. Easily fixed, though.


-- 
Art is anything that has a label (especially if the label is untitled 1)

Re: speeding up XS_DBI_dispatch()

2012-01-30 Thread Tim Bunce

On Mon, Jan 30, 2012 at 12:48:32PM +, Dave Mitchell wrote:
 On Sun, Jan 29, 2012 at 10:06:58PM +, Tim Bunce wrote:
  On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote:
   The code itself (see diff below) just attaches some magic to the outer CV
   that caches the GV of the corresponding inner method.
  
  Any reason for not using the existing 'internal method attribute'
  mechanism? _install_method() already allocates a structure and
  attaches it the the CV using CvXSUBANY(cv).any_ptr.  You could add the
  three elements of method_cache_t directly into dbi_ima_t.  Then, since
  the dispatch code already has the structure pointer (ima) handy, your
  cache code could avoid the mg_findext call and use ima-method_cache
  directly.
 
 Well, the original reason was that I wasn't aware of dbi_ima_t :-)
 
 However, on further reflection, I'm not sure that would work, as
 ima-method_cache would contain reference-counted SV pointers that would
 need to be freed when the CV is freed, and be duped when the CV is cloned
 in a new thread.  But there is no specific mechanism to allow this to
 happen, so I'd have to mess with CLONE (and maintain a list of special CVs
 that need cloning), and attach magic that has a free() method to ensure
 that ima is freed when the CV is. Which is all beginning to sound rather
 complex.

*nods*

 So I'm tempted to stick with the magic approach. I could optimise
 it to assume the cache magic is always the first one (and if not, take a
 slower route that finds and then moves it to the top).

That would be neat. Thanks.

dbih_getcom2 does the first part of that when looking up DBI hande
magic. I never added the move to the top because I figured it would be
very rare for magic to get added to the inner hash of a DBI handle.
It seems even less likely that someone would add magic to a DBI method,
but you never know. Unless you can think of a plausible case I'd be
happy if you just added the short-cut+fallback and skipped the shuffle.

 That's also caused me to realise that my current code isn't threadsafe, as
 it it doesn't dup the cache entries. Easily fixed, though.

Ah. I scratched my head for a while looking for threadafe issues and
missed that one. I'm glad you found it.

Thanks again Dave.

Tim.

Re: speeding up XS_DBI_dispatch()

2012-01-29 Thread Dave Mitchell

On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
 then the CPU usage of the while loop broke down as follows:
 
 7.0% overhead of loop,  i.e. while() {$c++}
19.2% handle the outer method call,  i.e. $sth-fetch()
  of which half is method lookup,
  half is assemble args ($sth) and pp_entersub
12.7% XS_DBI_dispatch() doing DBI stuff
20.5% XS_DBI_dispatch() doing method lookup and doing direct
  XS call to DBD::mysql::fetchrow_arrayref()
40.6% DBD::mysql::fetchrow_arrayref() fetching a row
 
 Given that my hacky cache for fetch() lookup reduced overall execution
 time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2%
 to 16.7%, i.e. nearly halves the dispatch time.

I've now written some working caching code, that reduces CPU usage on
while ($sth-fetch()) {$c++} from 15.23s to 13.10s, which is a 14%
saving!! The test code just fetches millions of rows from a 2-int table
and then doesn't do anything real with the returned data, so a real-world
example will see a smaller saving, but I'm still very pleased with it :-)

It's not ready for use yet; it passes the DBI test suite, but:
 * I haven't tested it under ithreads or early perls yet;
 * I haven't tried it against the DBD::mysql test suite yet;
 * I haven't written any tests yet for the method cache invalidation,
e.g. if code does *DBD::mysql::st::foo = sub {...},
   will subsequent calls to $sth-foo() call the new function?

As regards that last point, can you recommend where I should place the new
tests? Is there an existing test script that they can be added to, or if a
new script needs adding, what should it be called, and is there a good
existing script to use as a template?

The code itself (see diff below) just attaches some magic to the outer CV
that caches the GV of the corresponding inner method. It also remembers
the inner stash and generation number, so that the cache can be
invalidated if the outer method is called on another class. So for
example, in the following pathological example:

while ($sth_mysql-fetch) {
$sth_oracle-fetch;
...
}

the cache in the magic attached to DBI::st::fetch will oscillate
between pointing to *DBD::mysql::st:fetch and the similar oracle function,
so no speedup will be seen. But normally, when there is a constant mapping
between the outer and inner subs, the cache will speed things up.

The caching code *does* assume that the name of the outer sub never
changes, i.e. that GvNAME(CvGV(cv)) in XS_DBI_dispatch is constant WRT cv;
is this a reasonable assumption? Otherwise, I would have to cache the sub
name too, which will slow things down.

The code makes use of non-API knowledge about PL_generation and
HvMROMETA(stash)-cache_gen, which could conceivably change in a later
perl release.

 PS - I'm specifically being paid only to fix a performance issue on
 non-threaded builds, so I won't be looking at threaded issues. But
 dPERINTERP looks like bad news on threaded builds.

I've since been told that a) I'm allowed to say who's funding me:
it's booking.com (thanks, guys!); and that b) depending on how much time I
burn on the cache issue, I may have time to look at threaded issues too,
specifically the dPERINTERP rewrite using MY_CXT - assuming that no one
beats me to it!

Dave.


Index: DBI.xs
===
--- DBI.xs  (revision 15099)
+++ DBI.xs  (working copy)
@@ -82,6 +82,8 @@
 static I32  dbi_hash _((const char *string, long i));
 static void dbih_dumphandle _((pTHX_ SV *h, const char *msg, int level));
 static int  dbih_dumpcom _((pTHX_ imp_xxh_t *imp_xxh, const char *msg, int 
level));
+static int  method_cache_free(pTHX_ SV* sv, MAGIC* mg);
+static GV*  inner_method_lookup(pTHX_ HV *stash, CV *cv, const char 
*meth_name);
 char *neatsvpv _((SV *sv, STRLEN maxlen));
 SV * preparse(SV *dbh, const char *statement, IV ps_return, IV ps_accept, void 
*foo);
 
@@ -161,6 +163,74 @@
 /* 32 bit magic FNV-0 and FNV-1 prime */
 #define FNV_32_PRIME ((UV)0x01000193)
 
+
+
+/* ext magic attached to outer CV methods to quickly locate the
+ * corresponding inner method
+ */
+
+static MGVTBL method_cache_vtbl = { 0, 0, 0, 0, method_cache_free, 0, 0, 0 };
+
+typedef struct {
+HV *stash;  /* the stash we found the GV in */
+GV *gv; /* the GV containing the inner sub */
+U32 generation; /* cache invalidation */
+} method_cache_t;
+
+static int method_cache_free(pTHX_ SV* sv, MAGIC* mg)
+{
+method_cache_t *c = (method_cache_t *)(mg-mg_ptr);
+SvREFCNT_dec(c-stash);
+SvREFCNT_dec(c-gv);
+return 0;
+}
+
+static GV* inner_method_lookup(pTHX_ HV *stash, CV *cv, const char *meth_name)
+{
+GV *gv;
+method_cache_t *c;
+MAGIC *mg = mg_findext((SV*)cv, DBI_MAGIC, method_cache_vtbl);
+
+if (mg) {
+if (  (c=(method_cache_t *)(mg-mg_ptr))
+ c-stash == stash
+

Re: speeding up XS_DBI_dispatch()

2012-01-29 Thread Tim Bunce

On Sun, Jan 29, 2012 at 06:12:50PM +, Dave Mitchell wrote:
 
 I've now written some working caching code, that reduces CPU usage on
 while ($sth-fetch()) {$c++} from 15.23s to 13.10s, which is a 14%
 saving!! The test code just fetches millions of rows from a 2-int table
 and then doesn't do anything real with the returned data, so a real-world
 example will see a smaller saving, but I'm still very pleased with it :-)

Nice.

 It's not ready for use yet; it passes the DBI test suite, but:
  * I haven't tested it under ithreads or early perls yet;
  * I haven't tried it against the DBD::mysql test suite yet;
  * I haven't written any tests yet for the method cache invalidation,
   e.g. if code does *DBD::mysql::st::foo = sub {...},
will subsequent calls to $sth-foo() call the new function?
 
 As regards that last point, can you recommend where I should place the new
 tests? Is there an existing test script that they can be added to, or if a
 new script needs adding, what should it be called, and is there a good
 existing script to use as a template?

The DBI test suite is, sadly, rather crufty. Lots of test files have far
outgrown their original scope.

I'd reccomend starting a new one, say 31methcache.t. You could use
DBD::Sponge (as many tests do, e.g. 30subclass.t) and create two
statement handles each preloaded with different data.
Then fetchrow_arrayref from one and fetchrow_hashref from the other,
for example, checking you get back the right data with the right
kind of reference. Just a thought.

 The code itself (see diff below) just attaches some magic to the outer CV
 that caches the GV of the corresponding inner method.

Any reason for not using the existing 'internal method attribute'
mechanism? _install_method() already allocates a structure and
attaches it the the CV using CvXSUBANY(cv).any_ptr.  You could add the
three elements of method_cache_t directly into dbi_ima_t.  Then, since
the dispatch code already has the structure pointer (ima) handy, your
cache code could avoid the mg_findext call and use ima-method_cache
directly.

 The caching code *does* assume that the name of the outer sub never
 changes, i.e. that GvNAME(CvGV(cv)) in XS_DBI_dispatch is constant WRT cv;
 is this a reasonable assumption?

Yes, I think so.

Though some kind of 'kill switch' to disable the cache might be handy,
in case odd things are happening and the cache is suspected.

 The code makes use of non-API knowledge about PL_generation and
 HvMROMETA(stash)-cache_gen, which could conceivably change in a later
 perl release.

Which seems like a good argument for extending the perl API to cover
that kind of functionality.

  PS - I'm specifically being paid only to fix a performance issue on
  non-threaded builds, so I won't be looking at threaded issues. But
  dPERINTERP looks like bad news on threaded builds.
 
 I've since been told that a) I'm allowed to say who's funding me:
 it's booking.com (thanks, guys!); and that b) depending on how much time I
 burn on the cache issue, I may have time to look at threaded issues too,
 specifically the dPERINTERP rewrite using MY_CXT - assuming that no one
 beats me to it!

That would be great. Thanks again Dave.

 Dave.
 
 
 Index: DBI.xs

Looks good.

Tim.

Re: speeding up XS_DBI_dispatch()

2012-01-27 Thread Tim Bunce

On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote:
 On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
   
   The DBI is always allowed to cheat in the name of performance!
   My simple but effective xsbypass hack doesn't seem to have caused
   problems in the years it's been there (though it's a pity threaded perls
   can't use it).
  
  Just out of curiosity, why can't threaded perls use it?
 
 To be honest I can't remember now.  It's disabled
 #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT)
 Can you think of a reason? All tests pass. Maybe I was just being 
 conservative.

Ah, that's USE_THREADS not USE_ITHREADS. Ancient history.

I've no idea what PERL_CAPI is. There's no reference to it in current
perl headers and only a mention in Changes5.6: Log: ensure XS_LOCKS
stuff happens *before* XSUB is entered under -DPERL_CAPI
So I'll regard that as ancient history as well.

As for PERL_OBJECT, I see that perl58delta.pod says PERL_OBJECT has been
completely removed. So that's yet more ancient history.

So it seems that xsbypass is always enabled in 5.8+ builds.

Tim.

Re: speeding up XS_DBI_dispatch()

2012-01-27 Thread Dave Mitchell

On Fri, Jan 27, 2012 at 02:53:47PM +, Tim Bunce wrote:
 On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote:
  On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
   Just out of curiosity, why can't threaded perls use it?
  
  To be honest I can't remember now.  It's disabled
  #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT)
  Can you think of a reason? All tests pass. Maybe I was just being 
  conservative.
 
 Ah, that's USE_THREADS not USE_ITHREADS. Ancient history.

Actually, USE_THREADS is normally also defined when USE_ITHREADS is
defined (the original USE_THREADS was renamed USE_5005THREADS).

 I've no idea what PERL_CAPI is. There's no reference to it in current
 perl headers and only a mention in Changes5.6: Log: ensure XS_LOCKS
 stuff happens *before* XSUB is entered under -DPERL_CAPI
 So I'll regard that as ancient history as well.
 
 As for PERL_OBJECT, I see that perl58delta.pod says PERL_OBJECT has been
 completely removed. So that's yet more ancient history.
 
 So it seems that xsbypass is always enabled in 5.8+ builds.

So, I don't think its enabled, but it sounds like it could be.

-- 
The Enterprise's efficient long-range scanners detect a temporal vortex
distortion in good time, allowing it to be safely avoided via a minor
course correction.
-- Things That Never Happen in Star Trek #21

Re: speeding up XS_DBI_dispatch()

2012-01-27 Thread Tim Bunce

On Fri, Jan 27, 2012 at 03:33:51PM +, Dave Mitchell wrote:
 On Fri, Jan 27, 2012 at 02:53:47PM +, Tim Bunce wrote:
  On Wed, Jan 25, 2012 at 08:17:27PM +, Tim Bunce wrote:
   On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
Just out of curiosity, why can't threaded perls use it?
   
   To be honest I can't remember now.  It's disabled
   #if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT)
   Can you think of a reason? All tests pass. Maybe I was just being 
   conservative.
  
  Ah, that's USE_THREADS not USE_ITHREADS. Ancient history.
 
 Actually, USE_THREADS is normally also defined when USE_ITHREADS is
 defined (the original USE_THREADS was renamed USE_5005THREADS).

Ah, ok.

  [...]
  So it seems that xsbypass is always enabled in 5.8+ builds.
 
 So, I don't think its enabled, but it sounds like it could be.

It's now enabled by default and there's an (undocumented) env var to
disable it if the need ever arises.

Tim.

dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

2012-01-26 Thread Tim Bunce

On Wed, Jan 25, 2012 at 05:37:13PM -0800, Jan Dubois wrote:
 On Wed, 25 Jan 2012, Tim Bunce wrote:
  On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
  
   PS - I'm specifically being paid only to fix a performance issue on
   non-threaded builds, so I won't be looking at threaded issues. But
   dPERINTERP looks like bad news on threaded builds.
  
  The dPERINTERP stuff was added by ActiveState years ago to support
  MULTIPLICITY. I don't remember the details now. I do recall that driver
  authors are encouraged to avoid using the DBIS macro due to the cost.
  There's rarely a need now as DBI handles carry a pointer to the
  dbi_state structure. The DBI::DBD docs say
 
 Sorry, only skimming the conversation, so maybe this is obvious to
 you: PERINTERP was a prototype from the 5.005 days for the MY_CXT
 stuff that is now in the code. If you look at 5.8.1, then you'll see
 MY_CXT being almost identical to PERINTERP in DBI. But switching to
 MY_CXT should give you performance benefits on later Perl versions,
 which have an improved MY_CXT implementation.

Great. Thanks Jan.

Anyone want to take a shot at that?

Tim.

Re: speeding up XS_DBI_dispatch()

2012-01-25 Thread Dave Mitchell

On Tue, Jan 24, 2012 at 08:42:28PM +, Tim Bunce wrote:
 On Mon, Jan 23, 2012 at 12:51:12AM +, Dave Mitchell wrote:
  In the particular case I've been investigating, the method lookup in
  XS_DBI_dispatch() contributed a measurable part of the total loop time;
  in particular, when as a test I added very crude and hacky caching for the
  lookup of the inner fetch method, the execution time for the loop dropped
  from 23.0s to 19.2s.
 
 I wonder what that is as a % change in the cost of dispatch.

Using a localhost mysql server with a table with two int columns and 20M
rows, and the following code on a non-threaded, -O2  perl 5.15.6:

...
my $cmd = 'select SQL_NO_CACHE id, val from t3';
$dbh-{'mysql_use_result'}=1;
my $sth = $dbh-prepare($cmd) or die;
my ($id, $val);
$sth-bind_columns(\$id, \$val);

while ($sth-fetch) {
$c++;
}

then the CPU usage of the while loop broke down as follows:

7.0% overhead of loop,  i.e. while() {$c++}
   19.2% handle the outer method call,  i.e. $sth-fetch()
 of which half is method lookup,
 half is assemble args ($sth) and pp_entersub
   12.7% XS_DBI_dispatch() doing DBI stuff
   20.5% XS_DBI_dispatch() doing method lookup and doing direct
 XS call to DBD::mysql::fetchrow_arrayref()
   40.6% DBD::mysql::fetchrow_arrayref() fetching a row

Given that my hacky cache for fetch() lookup reduced overall execution
time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2%
to 16.7%, i.e. nearly halves the dispatch time.

 The guts of the DBI are ancient and baroque with many dusty corners
 were dragons may, or may not, be lurking.

Oh dear :-)

Just out of curiosity (since I'm in the middle of trying to understand
the code in XS_DBI_dispatch), what is the main purpose of having all
functions share a big dispatch fn which calls the inner method, rather
than having a separate function for each?

  * To what extent (if any) is DBI allowed to cheat when it comes to
bypassing the perl API (in particular I'm thinking of the method
invalidation stuff which I don't think has always been part of the API).
 
 The DBI is always allowed to cheat in the name of performance!
 My simple but effective xsbypass hack doesn't seem to have caused
 problems in the years it's been there (though it's a pity threaded perls
 can't use it).

Just out of curiosity, why can't threaded perls use it?

PS - I'm specifically being paid only to fix a performance issue on
non-threaded builds, so I won't be looking at threaded issues. But
dPERINTERP looks like bad news on threaded builds.

-- 
Lady Nancy Astor: If you were my husband, I would flavour your coffee
with poison.
Churchill: Madam - if I were your husband, I would drink it.

Re: speeding up XS_DBI_dispatch()

2012-01-25 Thread Tim Bunce

On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
 On Tue, Jan 24, 2012 at 08:42:28PM +, Tim Bunce wrote:
 
 Given that my hacky cache for fetch() lookup reduced overall execution
 time by 16.5%, that reduces overall XS_DBI_dispatch percentage from 33.2%
 to 16.7%, i.e. nearly halves the dispatch time.

Nice.

  The guts of the DBI are ancient and baroque with many dusty corners
  were dragons may, or may not, be lurking.
 
 Oh dear :-)
 
 Just out of curiosity (since I'm in the middle of trying to understand
 the code in XS_DBI_dispatch), what is the main purpose of having all
 functions share a big dispatch fn which calls the inner method, rather
 than having a separate function for each?

Almost all the functionality in the dispatcher (tracing, RaiseError,
Callbacks, Profiling etc.) would need to be duplicated in each method.

   * To what extent (if any) is DBI allowed to cheat when it comes to
 bypassing the perl API (in particular I'm thinking of the method
 invalidation stuff which I don't think has always been part of the API).
  
  The DBI is always allowed to cheat in the name of performance!
  My simple but effective xsbypass hack doesn't seem to have caused
  problems in the years it's been there (though it's a pity threaded perls
  can't use it).
 
 Just out of curiosity, why can't threaded perls use it?

To be honest I can't remember now.  It's disabled
#if (defined USE_THREADS || defined PERL_CAPI || defined PERL_OBJECT)
Can you think of a reason? All tests pass. Maybe I was just being conservative.

 PS - I'm specifically being paid only to fix a performance issue on
 non-threaded builds, so I won't be looking at threaded issues. But
 dPERINTERP looks like bad news on threaded builds.

The dPERINTERP stuff was added by ActiveState years ago to support
MULTIPLICITY. I don't remember the details now. I do recall that driver
authors are encouraged to avoid using the DBIS macro due to the cost.
There's rarely a need now as DBI handles carry a pointer to the
dbi_state structure. The DBI::DBD docs say

The names Cdbis and CDBIS, which were used in previous versions of
this document, should be replaced with the CDBIc_STATE(imp_xxh) macro.

Hopefully most compiled drivers are following that advice now.
Ah, actually that's wrong! It should say DBIc_DBISTATE(imp_xxh).
Fixed in r15098.

[goes to look at the DBI.xs sources] Oh, wow. There are more dPERINTERP's
left than I thought. I've just eliminated a bunch of them in r15099.
Several were in hot paths, including fetchrow_arrayref and handle creation
and destruction. That's a useful win for threaded perl.

Thanks Dave!

Tim.

RE: speeding up XS_DBI_dispatch()

2012-01-25 Thread Jan Dubois

On Wed, 25 Jan 2012, Tim Bunce wrote:
 On Wed, Jan 25, 2012 at 04:14:07PM +, Dave Mitchell wrote:
 
  PS - I'm specifically being paid only to fix a performance issue on
  non-threaded builds, so I won't be looking at threaded issues. But
  dPERINTERP looks like bad news on threaded builds.
 
 The dPERINTERP stuff was added by ActiveState years ago to support
 MULTIPLICITY. I don't remember the details now. I do recall that driver
 authors are encouraged to avoid using the DBIS macro due to the cost.
 There's rarely a need now as DBI handles carry a pointer to the
 dbi_state structure. The DBI::DBD docs say

Sorry, only skimming the conversation, so maybe this is obvious to
you: PERINTERP was a prototype from the 5.005 days for the MY_CXT
stuff that is now in the code. If you look at 5.8.1, then you'll see
MY_CXT being almost identical to PERINTERP in DBI. But switching to
MY_CXT should give you performance benefits on later Perl versions,
which have an improved MY_CXT implementation.

Cheers,
-Jan

speeding up XS_DBI_dispatch()

2012-01-22 Thread Dave Mitchell

[ disclaimer: I am being funded to do this work ]

I've been looking into bottlenecks in a tight fetch loop with millions of
rows, e.g.

$sth-bind_columns(...);
while ($sth-fetch) {
# do a small amount of work
}

In the particular case I've been investigating, the method lookup in
XS_DBI_dispatch() contributed a measurable part of the total loop time;
in particular, when as a test I added very crude and hacky caching for the
lookup of the inner fetch method, the execution time for the loop dropped
from 23.0s to 19.2s.

So, I would like to attempt to add a method cache to XS_DBI_dispatch()
(and possibly see if anything else can be tweaked).

My questions are:

* Would such a patch be welcome in principle?
* Any gotchas that I might not be aware of (I'm not familiar with DBI/DBD
  internals apart from what I've played with for the last few days).
* What's the earliest perl we have to maintain compatibility with?
* Does the test suite already have good coverage for method lookup
  (especially where methods are redefined or stashes modified, i.e. the
  sort of thing that would invalidate caches), or should I add some?
* can I rely on _install_method() being the only way that XS CVs can get
  to point to XS_DBI_dispatch()?
* To what extent (if any) is DBI allowed to cheat when it comes to
  bypassing the perl API (in particular I'm thinking of the method
  invalidation stuff which I don't think has always been part of the API).

Thanks,

Dave

-- 
Please note that ash-trays are provided for the use of smokers,
whereas the floor is provided for the use of all patrons.
-- Bill Royston

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

undefined symbol: mro_meta_init on 5.10.0 (was: speeding up XS_DBI_dispatch())

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

Re: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

RE: dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

RE: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

dPERINTERP/MY_CXT (was: speeding up XS_DBI_dispatch())

Re: speeding up XS_DBI_dispatch()

Re: speeding up XS_DBI_dispatch()

RE: speeding up XS_DBI_dispatch()

speeding up XS_DBI_dispatch()

32 matches

Site Navigation

Mail list logo

Footer information