Re: forking and DBI

Jonathan Leffler Tue, 13 Nov 2001 23:00:39 -0800

CHEN SEI-LIM wrote:

> Jonathan Leffler wrote:
>
> > CHEN SEI-LIM wrote:
> >
> > > You can make a simple program to try it not only your supposition.
> > > If I am wrong I will say sorry to everybody.
> >
> > Since you're so adamant that it can be done, perhaps you'd care to show us the
> > code that allows it to work; then you won't have to apologize to anybody.  In
> > the first instance, I'm not actually going to be fussed about whether your code
> > is o/s independent or DBMS independent - let's see one working example for
> > Oracle, or Informix, or DB2, or Sybase.  If you have none of these, then perhaps
> > you'd reveal which database(s) you know it can be made to work with -- I presume
> > you have actually done this at least once and probably many times, so it won't
> > be particularly difficult for you to do.
>
> In UNIX, child inherit every thing from parent but PID after forking.
> The same stack, heap, text, and file discriptors.


In POSIX.1-1990 (which is what I have at home; I keep the 1996 standard at work),
there are 8 items that are different between the child and the parent:

1.    The child process has a unique process ID.  The child's PID also does not match
any active process group ID.

2.    The child process has a different parent process ID (which is the process ID of
the parent process).

3.    The child process has its own copy of the parent's file descriptors.  Each of
the childs file descriptors refers to the the same open file description with the
corresponding file descriptor of the parent.

[JL notes: I believe that the significance of this is that if the child reads on a
file, it does not move the parent processes current read position, or vice versa, but
that concept really only applies to regular files and block devices.  Sockets, some
character devices, pipes and FIFOs have different properties; in particular, if
process A reads a byte from a socket, process B can't read that same byte (whereas in
a regular file or a block device, process B can read the same byte that process A
got.]

4.    The child process has its own copy of the parent's open directory streams.

5.    The child process's values of tms_utime, tms_stime, tms_cutime and tms_cstime
are set to zero.

6.    File locks previously set by the parent are not inherited by the child.

7.    Pending alarms are cleared for the child process.

8.    The set of signal pending for the child process is initialized to the empty set.

I'm pretty sure that there are a whole bunch of extra small details that are different
in the current POSIX.1-1996 standard.

Granted, that does not mean that sockets etc won't also work, but sockets aren't the
only way of communicating with database servers.  Informix also supports a shared
memory connection method, and a stream pipe method, and on some platforms it uses TLI
in preference to sockets, and there are, or used to be, some other exotic protocols,
and SE uses plain old unamed pipes, in general.  AFAIK, the shared memory method also
uses a semaphore (and semaphores are one of the things in POSIX.1-1996 that will be
different in the child and the parent).  Even that may not be crippling.

Further, and more seriously, if both the child and the parent write a message to the
socket, then both will hang waiting to read the response on the socket, and there's no
guarantee that the 'correct' process will read the correct answer.  Especially if the
answers are not a single packet.  The Unix scheduler does not guarantee that you'll
repeatedly wake only the process that happens to be waiting for the right part of the
response -- it'll wake any process that happens to be waiting to read any response.

Further still, if the two processes happened to create statements with the same name
(because at least in Informix, each SELECT statement gets given a name, and the name
generator is not safe between multiple processes sharing a single connection -- it's
fine for multiple processes with their own connections), then the parent will create a
statement number ix_007, and then the child process might create statement ix_007, and
then when the parent gets to execute ix_007, it will actually be the child's
statement, not its own.  The chances are it won't work very well.  If you're really
unlucky, it might be the difference between UPDATE SomeTable SET flag = 1 WHERE
key_value = ? and DELETE FROM SomeTable WHERE key_value > ?, and all hell breaks loose
when the wrong data is deleted.  Funny, database folks tend to get antsy when data
goes missing unexpectedly.  Especially since users tend to be careless about using
AutoCommit and about checking the number of rows affected by a statement (though if
AutoCommit is On, it doesn't help you to know that you operated on 3000 rows instead
of 1 -- the transaction has been committed).

So, sure, you can communicate with the database server down the same socket.  But,
unless you go through the overhead of ensuring that the two processes are not trying
to use it concurrently, you don't know whether you are going to get coherent answers.
And the overhead of ensuring that the two processes are not trying to use it
concurrently would be far higher than the overhead of simply setting up to independent
connections.

> If every query was transmitted via socket from client library to database engine.
> Why it can not work? Can you tell me why?
> So it is the reason I suppose processes can sharing $dbh.
> But why Oracle, or Informix, or DB2, or Sybase do not support $dbh sharing?

Because, in general, two processes want to do their own queries at their own rates,
without having to ask the other "do you mind if I start a transaction now", or "do you
mind if I do a rollback now".  Even assuming you managed to get the communications
issues resolved, the single connection can only have a single transaction open at a
time, so both the changes made by the child and those made by the parent will be in
each others transactions - doubly an issue if AutoCommit has to be simulated by
explicitly adding a COMMIT after each statement.  Basically, it doesn't make sense to
try it, unless you have a very much more restricted set of semantics than is usual.
If the parent process will go to sleep until after the child finishes, then maybe
you'd stand a chance; except, of course, that when the child terminates, the
connection will be cleaned up, and when the connection is cleaned up in the child, the
database engine will think that the connection is no longer valid, and will object
mightily to the parent trying to use a non-existent connection.  The problems go on,
and on, and on.


> Maybe they really do not want anybody can do it by their business concern.

No, it's simpler than that. They want people to get reliable results from using their
products.

There are also licencing issues -- all the world is not Open Source -- but I'm
primarily concerned with the technical results of abusing the connection, not with the
legal problems you get into.

Anyway, if you are happy to try it, and then to accept any errors that occur, that's
fine by me.  It's your database, and your problem.  You were warned that it was not a
particularly good idea.  For some reason that currently escapes me, I've detailed
quite a number of reasons why it is not a good idea.  I don't think any of the formal
reasons given are completely specific to Informix -- issues like statement names and
controlling who gets woken to read the data and so on are likely to be common to all
multi-process database systems.  The issue of locks being lost across a fork() is apt
to be traumatic for single-process database systems (I want to say embedded databases,
but it has the wrong connotations).

And, finally, the DBI documents that it is not supported and not reliable.  What more
do you need to know about it?

--
Jonathan Leffler ([EMAIL PROTECTED], [EMAIL PROTECTED])
Guardian of DBD::Informix 1.00.PC1 -- see http://www.cpan.org/
#include <disclaimer.h>

Re: forking and DBI

Reply via email to