Re: Clarification sought on execute_array

Martin J. Evans Thu, 20 Jan 2011 07:19:46 -0800

On 20/01/11 09:55, Tim Bunce wrote:
> On Wed, Jan 19, 2011 at 10:25:05PM +0000, Martin J. Evans wrote:
>> On 19/01/2011 11:47, Tim Bunce wrote:
>>> 
>>> - I think it's reasonable for execute_array() and
>>> execute_for_fetch() to return an error (ie err() true and so
>>> trigger RaiseError etc.) if execution of any of the tuples
>>> encountered an error.


Stepping back a minute and ignoring what the DBI docs say, what DBI's 
execute_array and DBD::Oracle's execute_array does right now, what does someone 
using it want? Take the simple case of a bulk insert of N rows as an example, 
what could happen:

1 all executed and all succeed
we want a success status and it would not hurt to get a count back (although 
for what that count is see later)

2 all are executed but some fail
we want to know all were executed but that some failed and we want to be able 
to identify the failed rows

3 the initial rows are successful but one errors and the remainder are not 
executed
we want to know which ones were executed and which ones failed and the 
remainder were never executed. For DBI's execute_array this never happens.

In ODBC, we know almost all of this because:

a) SQLExecute returns SQL_SUCCESS if everything worked and 
SQL_SUCCESS_WITH_INFO is some failed.

b) the Params Processed Array contains values for each row which may be:

SQL_PARAM_SUCCESS - executed and successful
SQL_PARAM_SUCCESS_WITH_INFO - executed but there are warnings (e.g., data 
truncated)
SQL_PARAM_ERROR - executed and errored
SQL_PARAM_UNUSED - this set of parameters was never used - perhaps because the 
DBMS aborted due to an error on a previous set of parameters
SQL_PARAM_DIAG_UNAVAILABLE - information not available per parameter set as the 
DBMS treats the entire batch as one - never seen this myself.

In ODBC we don't know rows affected per row of parameters as SQLRowCount only 
returns a total and there is no attribute to use per row of parameters for rows 
affected.

The count is a tricky one as for execute it is the number of rows affected and 
not a count of statements executed (I know Oracle does not know this on a per 
row basis in a batch but some other APIs/DBs might - although not ODBC to my 
knowledge). When array context was added to DBI and DBD::Oracle the first value 
returned is the normal $tuples 
(rows executed) but the second value is supposed to be the sum of the rows 
affected for each row in the batch (I added that as I wanted to avoid counting 
them all up in the ArrayTupleStatus). It appears that even in the DBI version 
of this if some of the rows in the batch fail the sum of affected rows is 
returned as undef (oops).

So where are we now with DBI? :

> 
> Umm. I take that back. I suggest we change the spec from:
> 
> When called in scalar context the execute_array() method returns the 
> number of tuples executed, or undef if an error occurred. Like
> execute(), a successful execute_array() always returns true 
> regardless of the number of tuples executed, even if it's zero. If
> there were any errors the ArrayTupleStatus array can be used to 
> discover which tuples failed and with what errors.

Good:

we know if all succeed or some/all fail (from return, N or undef)

we know which rows in the batch failed from ArrayTupleStatus

Because it errors if any row errors we can also use HandleError

Bad:

We don't really know the total number of tuples executed because if some fail 
execute_array will return undef. We know the ones marked in ArrayTupleStatus as 
failed were executed and we know the ones up to the last failure were executed 
but we don't know about the ones after the last failure :-(

> to either (new or changed text in capitals):
> 
> a:  When called in scalar context the execute_array() method returns
> the number of tuples executed INCLUDING ANY TUPLES WHICH FAILED, or
> undef if an error occurred THAT PREVENTED NORMAL OPERATION OF THE 
> METHOD. THE FAILURE OF SOME OR ALL TUPLES IS NOT REGARDED AS AN
> ERROR. Like execute(), a successful execute_array() always returns
> true regardless of the number of tuples executed, even if it's zero. 
> If there were any errors the ArrayTupleStatus array can be used to 
> discover which tuples failed and with what errors.

Good:

We know how many were executed (including failed ones)

We know which rows in the batch failed from ArrayTupleStatus

Bad:

We don't know if all succeed or some failed without checking ArrayTupleStatus

If some fail the HandleError function is not called

> or
> 
> b:  When called in scalar context the execute_array() method returns
> the number of tuples executed IF ALL ARE SUCCESSFUL. IF ANY TUPLES 
> ENCOUNTER AN ERROR THEN execute_array() RETURNS UNDEF. Like
> execute(), a successful execute_array() always returns true 
> regardless of the number of tuples executed, even if it's zero. If
> there were any errors the ArrayTupleStatus array can be used to 
> discover which tuples failed and with what errors.

Good:

we know if all succeed or some/all fail (from return, N or undef)

We know which rows in the batch failed from ArrayTupleStatus

Because it errors if any row errors we can also use HandleError.

For DBI's execute_array we know all were executed because that is what it does.

Bad:

For DBD::Oracle's execute_array we don't know which rows were executed if some 
failed

> I prefer (a) but (b) is how the DBI's default execute_array() and 
> execute_for_fetch() behave, which is obviously significant.

I don't like (a) because to find out if all were successful I have to look into 
ArrayTupleStatus and it does not call HandleError.

I don't mind (b) so long as it is DBI's execute_array as I know all rows are 
executed. For DBD::Oracle (b) is a problem because we don't know which rows 
were executed if 1 or more fail.

To me, scalar context needs to tell me if execute_array was 100% successful or 
not so I can decide what to do and I'd prefer it to error if not 100% 
successful so HandleError is called. Most people are going to want it all to 
succeed so it should be obvious it did and if it fails, many people will simply 
rollback (which they can do without resorting to searching ArrayTupleStatus).

So we are left with those people who might expect to see a row fail and want to 
do something about it. Well, as it stands with DBI's execute_array they know 
all are executed and which ones failed - no problem. For DBD::Oracle they are 
stuck now anyway as they don't know which rows were executed (other than the 
failed ones and if the driver stopped on the first failure). This is where the 
array context comes in as we can leave the return value as it is (all N 
succeeded or undef) and the second value is rows executed :-)

I think DBD::Oracle's execute_array (as shown by my test script - which could 
be a new Oracle or DBD::Oracle bug) is totally flawed and hugely dangerous as 
code typically does this:

RaiseError is set
execute_array - oops never raises an error EVER even if some rows fail!

RaiseError is set (as above, totally meaningless)
my $ret;
eval {
   $ret = execute_array;
   1;
}
die "error" if ($@ or !$ret); # ok will die if total error or partial error
but you only know the error rows and not which ones succeeded

DBIx::Class expected an error Raised (which did not happen) but also their code 
looked at the value of err and that is never set with DBD::Oracle - oops.

>> The way I see it is that if some of the rows in the batch end up
>> in the table and some don't I'd expect a warning. However, if 1 or
>> more rows fail and no rows end up in the table I don't see any
>> success so I'd expect an error. In my example code, 1 row fails but
>> no rows end up successful so it is the latter.
> 
> Shouldn't the number of rows "failing" and the number not ending up
> in the database be the same? Anything else seems like a bug. Or are
> you refering to a batch being aborted early?

you'd expect so but:

>From my current test script, they either all work (and appear in the table) or 
>some fail and none end up in the database. It is not even aborting on the 
>first failure as my error row is row 2 and row 1 doesn't appear.

> Regarding the batch being aborted early, how about this new spec?:
> 
> Some databases may abort the batch early after one or more errors. In
> which case the number of elements in ArrayTupleStatus, would be less
> than expected.

Ah, interesting, didn't think of that. I think I still prefer array context to 
give you state and executed but I'd have no problem with this as well. What I 
don't want to have to do is count elements in ArrayTupleStatus or examine them 
all to work out if everything succeeded.
 
> (That's assuming we go with (b) above. If (a) then execute_array()
> would return the reduced tuple count.)

See above for my arguments for (b) with a twist.

>> It may be a change/bug in the Oracle libraries as I thought when
>> you execute a batch all the rows are executed no matter if some
>> fail and all successful ones end up in the table.
> 
> I wonder if we could run an older DBD::Oracle against an Oracle 9
> (say) database to recreate the (presumably) original behaviour.

I've only got Oracle 11 and may be a 10 around somewhere but I will try 
stepping back DBD::Oracle and oracle instant client.

Martin
-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

>>> - That'll need a spec clarification and a clear warning in
>>> Changes.
>>> 
>>> - We should avoid breaking existing batch apps that use
>>> DBD::Oracle.
> 
>>> - I'm hopeful that the above change wouldn't. (John?)
>>> 
>> So am I. If we could get the examples John has seen via his 
>> DBD::Oracle maintainership or via Pythian customers I would be
>> happy to create test cases. We are running blind at the moment as
>> we've not got those solid examples of supposedly working code.
> 
> Agreed. John, can you get some working examples so we can see what 
> error handling is used and what the impact of any changes above would
> be?
> 
>>> - We should review other database APIs that provide batch
>>> execution in order to spec a reasonable common subset behaviour
>>> for the DBI.
>>> 
>> Obviously ODBC and JDBC do batched statements. I might provide a 
>> JDBC example but for now I've done an ODBC example (slightly more 
>> familiar to me) - see below.
> 
> ODBC is handy, thanks. I'd also like to see how more native database 
> APIs handle these issues. Anyone know what other native database
> APIs support batches?
> 
>>> - Clearly we should bring DBD::Oracle, the DBI default behaviour,
>>> and the DBI spec into agreement with each other.
>>> 
>> Exactly. This is really my main point. As it stands (and given it
>> is not a bug in Oracle) I see a nightmare for anyone trying to use 
>> execute_array in a database neutral way as with DBI, all
>> successful rows are inserted and we know which ones failed and with
>> DBD::Oracle no rows are inserted (including the ones where there is
>> no error) and you cannot commit the good ones and it is difficult
>> to know (if not impossible) what really happened. This is not a dig
>> at anyone in particular as I added the array context execute_array
>> to DBI/DBD::Oracle but this is also why I suspect something has
>> changed in DBD::Oracle/Oracle.
> 
> Not being able to tell what happened is clearly a bug in
> DBD::Oracle.
> 
>>> - We *really* need a way to share tests across drivers. Perhaps
>>> something like a separate DBI::TestSuite distro that the DBI and
>>> drivers could have as a prerequisite. That would contain tests
>>> in modules.  The DBI and DBDs would have a test file that uses
>>> the DBI::TestSuite module and calls a function that runs the
>>> tests. This issue could provide the first test.
>>> 
>>> Tim.
>> 
>> I agree and I seem to remember a project to do something like this
>> - was it perhaps a google summer of code suggestion? But it is
>> pretty difficult and I think that puts a lot of people off. I
>> briefly looked at Test::Database so I could get more realistic test
>> results for DBD::ODBC but I ran in to a load of problems as
>> Test::Database needs some DBD methods writing and expects to be
>> able to create a database and in ODBC (via dozens of ODBC drivers)
>> there is not single way to do this. The gain was just not worth the
>> pain for me. I'd be happy to help someone do this but only in a
>> minor way as right now I cannot find the time to satisfy even half
>> of my OS commitments (as an example, I REALLY want to be able to
>> set handle attributes on methods in DBI [post from a week back] but
>> I just cannot find time to do it - something else is always
>> cropping up).
> 
> Having Test::Database know how to work with only a very small set of 
> common ODBC drivers would still be a big help.
> 
>> It has 4 tests:
>> 
>> 1. insert a batch successfully reading the parameter status array 
>> (ArrayTupleStatus) to see what worked (autocommit) 2. insert a
>> batch where 2 rows cannot be inserted because of a duplicate key
>> and with a parameter status array (autocommit) 3. insert a batch
>> where 2 rows cannot be inserted because of a duplicate key and
>> without a parameter status array (autocommit) 4. insert a batch
>> where 2 rows cannot be inserted with an explicit txn and with a
>> parameter status array
> 
>> What it tells us is:
>> 
>> o SQL_SUCCESS is always returned when all the rows are inserted o
>> SQL_SUCCESS_WITH_INFO is returned if some of the rows were 
>> successful but some were not
> 
> Which corresponds with (a) above, but doesn't match existing DBI
> default behaviour (b).
> 
>> o it does not matter whether we provide a parameter status array 
>> (ArrayTupleStatus) or not - the 2 results above stand i.e. even if 
>> the ODBC driver cannot tell you which ones failed (because you did 
>> not give a parameter status array) it still does the successful
>> rows and only returns SQL_SUCCESS_WITH_INFO if some failed. o
>> AutoCommit makes no difference - i.e., if auto commit is on or off 
>> the end result is the same IF we commit afterwards.
> 
> Tim.

Re: Clarification sought on execute_array

Reply via email to