Hi Henri,

I have some questions/avenues for you to pursue:

1) What happens when you change safe=>1 to safe=>0 in this code?

2) What happens if you close the entire dbh at this point (reopen it
later)?  -- its a thought?

3) Which DBD(s) have you tested this with? If more than one, does the
problem occur with all DBD's you have tried?  (Which ones have you
tried?)  I would want to know if this is DBD behavior or DBI code that
is freaking out or getting corrupted by the signal.  We could be dealing
with stacked Signal handlers.  Check the source of the DBD you are using
for the use of signal() or sigaction().

4) Have you considered looking at the output with DBI_TRACE=n Where n is
greater than 0 -- you can turn trace on just before the statement with

        $sth->trace(n);

5) If you are running on linux, can you do this with strace, and show us
the output? If on Sun... same question but use truss.  (Don't recall
know what does the equiv on AIX or HPUX).

6) Combine 4 and 5?

Lincoln



On Wed, 2004-11-24 at 16:52 -0800, Henri Asseily wrote:
> I have slammed into a wall in my quest for reliable failover and high 
> availability in DBI. I don't know if this discussion should be in 
> dbi-users or dbi-dev, but here goes:
> 
> High availability necessitates a good timeout handling system. If 
> execution of an sql statement or stored procedure takes too long, one 
> should have the opportunity to kill it and fail over to a less 
> overloaded server.
> 
> One problem is in the timeout handling in Perl (and Unix in general). 
> The standard $SIG{ALRM} technique utterly fails when trying to trap 
> $sth->execute(), and never gets triggered.
> That problem has now been resolved thanks to Lincoln Baxter's excellent 
> Sys::SigAction module (at least for Unix machines) which utilizes all 
> the techniques (POSIX sigaction, SIGALRM...) to ensure proper signal 
> handling.
> 
> But there's another more subtle problem that I only today finally 
> managed to get to the bottom of:
> 
> Assuming you use Sys::SigAction and you properly trap the execute() 
> call, you get nailed by DBI's aggressive sanity checking.
> 
> Suppose you have code like the following (copied from my upcoming 
> DBIx::HA 0.9x module):
> 
> eval {
>     my $h = set_sig_handler(
>              'ALRM',
>              sub { $timeout = 1; die 'TIMEOUT'; },
>              { mask=>['ALRM'],
>              safe=>1 }
>            );
>     alarm(10);
>     $res = $sth->SUPER::execute;
>     alarm(0);
> };
> alarm(0);
> 
> 
> If the alarm is triggered, then your statement handle ($sth) gets 
> automatically corrupted with no way to get rid of it. This in turn will 
> continuously add active kids to your database handle and corrupt everything.
> Below is the result of triggering the above alarm:
> 
>   null:        (in cleanup) dbih_setup_fbav: invalid number of fields: 
> -1, NUM_OF_FIELDS attribute probably not set right at ....
> 
>   null: DBI handle 0xabf1038 cleared whilst still active at ...
> 
> null: DBI handle 0xabf1038 has uncleared implementors data at ...
>      dbih_clearcom (sth 0xabf1038, com 0xaeb79b8, imp DBD::Sybase::st):
>         FLAGS 0x180057: COMSET IMPSET Active Warn ChopBlanks PrintWarn
>         PARENT DBIx::HA::db=HASH(0xa21e008)
>         KIDS 0 (0 Active)
>         IMP_DATA undef
>         LongReadLen 32768
>         NUM_OF_FIELDS -1
>         NUM_OF_PARAMS 0
> 
> 
> The statement handle was created but was never populated with the 
> execution results, so it's in a weird half-alive state.
> For example, the DBIc_NUM_FIELDS is -1, which makes dbih_setup_fbav() 
> croak. Similarly, DBIc_ACTIVE is still true.
> 
> Should there be an additional field for a handle that tells us if it's 
> not in a fully active state, and if so then we have carte blanche to 
> wipe it?
> What's the best strategy to deal with these zombies?
> 
> I can provide a patch when I dig deeper.
> 
> H
-- 

Reply via email to