David Schwartz wrote:
David you are bringing completely unrelated issues into the situation.

        No, you are failing to understand my argument.



A Kernel does its job of arbitration like this on a shared/duped file descriptor that both processes have as fd=4:

Thread/Process A                      Kernel Lower IO Layer
====================================| ===============================

select(4, [4], NULL, NULL, {10,0})
[we enter the kernel from the application context and proceed to test
fd 4 for readability, we do that test atomically with the following pseudo code:]

mutex_lock(flip->mutex);
if((flip->events & FLIP_BIT_READ_EVENT))
 select_readability_fd_four = 1;
else
 select_readability_fd_four = 0;
mutex_unlock(flip->mutex);


                                                  Data arrives on TCP
                                                  kernel marks read
                                                  event:
                                   mutex_lock(flip->mutex);
                                   flip->events |= FLIP_BIT_READ_EVENT;
                                   mutex_unlock(flip->mutex);


In practice the reporting mechanism may not work exactly like this but if it did it wouldn't alter the characteristics of it. In practice there may not even be any event bitmask in use since most applications don't use select/poll for I/O. This makes select/poll the minority so they chose the other approach to do all the work inside of the select/poll system call that is needed to calculate readability/writability.

Many target CPUs wont need mutexes since its possible to use machine code instructions to atomically set and reset bit patterns from a memory location (that is natually aligned). On intel i386 GNU syntax this maybe "movl 0x00000001,%eax; lock orl 0x012300" this would set EAX with 0x00000001, and then logically OR it with memory location 0x012300, thus setting bit0. For better understanding check out atomic_set_mask() from linux kernel include/asm-i386/atomic.h.

So there is set ordering of events, a well defined order.



The whole point is that event trigger mechanism and the event test mechanism (which is constitutes the interaction between select/poll/read/write/etc..) :

* do not cause events to be revoked reported once they are first reported to an application and not yet cleared by further read() / write(). * do not loose the posting of events during the event setting process, classic transactional "lost write" scenario, or race.


The Kernel IO Layer: only sets readability or writability events (new data arrive in for application, output buffer is below its water mark to guarantee some form of write again)

The select/poll: only looks at events in read-only fashion, it does not have the ability to set or reset events.

The read() family of functions: are the only things that can clear readability events

The write() family of functions: can reset writability events (buffer full is driven from application write, its reset by kernel low level i/o)



It does not matter how many processes or threads you have with access to the same file descriptor. It does not matter that the select() call pre-dates what we now call threading on unix, because that is irrelevant too. A process is the original form of parallel execution, a file descriptor is inherited across fork() so two processes have access to the same file descriptor and multi-cpu machine have existed in unix for a long time. The kernel was still doing its job of arbitration then as it does now.




I call again for David to prove an existing implementation of poll/select which does not confirm to the above guarantees. David is claiming that:

* A readability event can disappear (after it has been first indicated by poll/select and no read() family of functions have been called, recvmsg()/recv() etc...

* A writability event can disappear (after it has been first indicated by poll/select and no write() family of functions have been called, sendmsg()/send() etc...

We are also only interested in condition concerning file descriptors being used for bulk read/write. We dont care a donkey about accept() or any other system calls and the quirks of a particular platform. That is off-topic. We dont care a donkey about unrelated theoretically situations like what if I call close(fd), they are irrelevant to the original discussion.



The specifications for poll/select dont talk in terms of "nonblocking" or "blocking" of other system call because that does not concern the select system call. What does concern the select is the "readability" and "writability" of the file descriptor.

What this meant by those terms is that the file descriptor "can do more work" during the next syscall call related to it. This can also mean partial writes are possible, or error return would be indicated, or indicated end-of-stream. This can be through about as there is more information the kernel has to convey to the application about that file descriptor and kernel is ready to tell the application.

Then by virtue of that fact the next read() or write() related call does not block because we know there is something to do, even when the file descriptor is in blocking mode (as per fcntl(fd, F_SETFL, 0);)

There is no concurrency problem with the system call and its interaction with read/write over and above the _NORMAL_ accepted understanding that if you have two things doing something un-coordinated with the same resource (the fd) you are going to run into trouble relying on such an event reporting mechanism. This is a given.



Please provide your evidence your understanding is correct from a current working implementation of poll/select/read/write with sockets.

Rather than obtaining your understanding based on what is and what isn't written into a specification.


There is a reason why the select/poll man pages dont/cant talk about future system calls and blocking/non-blocking. This is the reason why I use the terms "by virtute of that" and recite the specifications term "readability", and "writeability". Thats because there are a rather lot of system calls that takes file descriptors in their arguments, since they are the main stay of unix. There are also rather a lot of contexts files descriptors can be used for (file io, socket io, char device io, block device io) and you are right they dont all confirm to the same semantics. But that does not concern us here we are only talking of socket io.




Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to