Re: [Python-Dev] thread semantics for file objects

2005-03-18 Thread Paul Moore
On Fri, 18 Mar 2005 07:57:25 +0100, Martin v. Löwis
[EMAIL PROTECTED] wrote:
 The guarantee that we want to make is certainly stronger: if the
 threads all read from the same file, each will get a series of chunks.
 The guarantee is that it is possible to combine the chunks in a way to
 get the original contents of the file (i.e. not only the sum of the
 bytes is correct, but also the contents).

That would be a useful property to be able to rely on, certainly.
(Although in practical terms, probably a lot less than people would
*like* to see guaranteed :-))

 However, I see little value adding this specific guarantee to the
 documentation when so many other aspects of thread interleaving
 are unspecified.

I'm not sure I agree. It's an improvement in the situation, so why not
add it? It may even encourage others, when thinking about threading
issues, to consider whether the documentation should guarantee
anything - and if so, to add that guarantee. Over time, the
documentation gets better at describing thread-related behaviour - and
correspondingly, people get (somewhat) more confident that where the
documentation doesn't guarantee things, it's because there is a good
reason.

 For example, if a thread reads a dictionary simultaneous to a write
 in another thread, and the read and the write deal with different
 keys, there is a guarantee that they won't affect each other. If they
 operate on the same key, the read either gets the old value, or the
 new value, but not both.

If this is a genuine guarantee, then let's document it! I asked about
precisely this issue on python-list a long while ago, and no-one could
provide me with a confident answer (I couldn't be sure myself, my head
explodes when I try to understand thread-related code). The only
confident answer I got was you're safe if you use a lock, but taking
that position to extremes results in massive levels of unnecessary
serialisation.

 Writing down all these properties does little good, IMO.

Not a huge amount of good, certainly. But no harm, and a little bit of
direct good, and also some indirect good in terms of making it clear
that the issue has been thought about. I suppose what I am saying that
there is a practical difference between undefined and unknown,
even if there isn't a theoretical one...

Of course, there's an implied requirement here to confirm any
documented guarantees in Jython, and IronPython, and PyPy, and... But
given that none of these (yet) implement the full Python 2.4 language
definition, as far as I am aware, it's probably not sensible to get
too hung up on this fact (although confirming that a guarantee doesn't
cause major implementation difficulties would be reasonable).

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-18 Thread Jeremy Hylton
On Fri, 18 Mar 2005 07:57:25 +0100, Martin v. Löwis
[EMAIL PROTECTED] wrote:
 Writing down all these properties does little good, IMO. This includes
 your proposed property of file reads: anybody reading your statement
 will think of course it works this way - why even mention it.

The thingsa that are so obvious they don't need to be written down are
often the most interesting things to write down.  In fact, you started
the thread by saying there were no guarantees whatsoever and chiding
me for asking if there were any.  But it seems there are some intended
semantics that are strong than what you would find in C or Perl. 
Hence, I don't think they would be obvious to anyone who comes to
Python from one of those languages.

I agree that the semantics of multi-threaded Python programs is an
enormous domain and we're discussing a tiny corner of it.  I agree
that it would be quite challenging to get better documentation or
specifications here.  But I also think that every little bit helps.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Aahz
On Thu, Mar 17, 2005, Jeremy Hylton wrote:

 Are the thread semantics for file objecst documented anywhere?  I
 don't see anything in the library manual, which is where I expected to
 find it.  It looks like read and write are atomic by virtue of fread
 and fwrite being atomic.

Uncle Timmy will no doubt agree with me: the semantics don't matter.
NEVER, NEVER access the same file object from multiple threads, unless
you're using a lock.  And even using a lock is stupid.
-- 
Aahz ([EMAIL PROTECTED])   * http://www.pythoncraft.com/

The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code -- 
not in reams of trivial code that bores the reader to death.  --GvR
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Jeremy Hylton
On Thu, 17 Mar 2005 16:25:44 -0500, Aahz [EMAIL PROTECTED] wrote:
 On Thu, Mar 17, 2005, Jeremy Hylton wrote:
 
  Are the thread semantics for file objecst documented anywhere?  I
  don't see anything in the library manual, which is where I expected to
  find it.  It looks like read and write are atomic by virtue of fread
  and fwrite being atomic.
 
 Uncle Timmy will no doubt agree with me: the semantics don't matter.
 NEVER, NEVER access the same file object from multiple threads, unless
 you're using a lock.  And even using a lock is stupid.

I'm not looking for your permission or approval.  I just want to know
what semantics are intended.  If the documentation wants to say that
the semantics are undefined that okay, although I think we need to say
more because some behavior has been provided by the implementation for
a long time.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Jeremy Hylton
On Thu, 17 Mar 2005 23:04:16 +0100, Martin v. Löwis
[EMAIL PROTECTED] wrote:
 Jeremy Hylton wrote:
 Are the thread semantics for file objecst documented anywhere?  I
 don't see anything in the library manual, which is where I expected to
 find it.  It looks like read and write are atomic by virtue of fread
 and fwrite being atomic.
 
 Uncle Timmy will no doubt agree with me: the semantics don't matter.
 NEVER, NEVER access the same file object from multiple threads, unless
 you're using a lock.  And even using a lock is stupid.
 
 
  I'm not looking for your permission or approval.
 
 Literally, the answer to your question is no. In fact, Python does not
 specify *any* interleaving semantics for threads whatsoever. The only
 statement to this respect is

I'm surprised that it does not, for example, guarantee that reads and
writes are atomic, since CPython relies on fread and fwrite which are
atomic.

Also, there are other operations that go to the trouble of calling
flockfile().  What's the point if we don't provide any guarantees?
0.6 wink.  If it is not part of the specified behavior, then I
suppose it's a quality of implementation issue.  Either way it would
be helpful if the Python documentation said something, e.g. you can
rely on readline() being threadsafe or you can't but the current
CPython implementation happens to be.

readline() seemed like an interesting case because readlines() doesn't
have the same implementation and the behavior is different.  So, as
another example, you could ask whether readlines() has a bug or not.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Jeremy Hylton
On Thu, 17 Mar 2005 17:13:05 -0500, Tim Peters [EMAIL PROTECTED] wrote:
 [Jeremy Hylton]
  Are the thread semantics for file objecst documented anywhere?
 
 No.  At base level, they're inherited from the C stdio implementation.
  Since the C standard doesn't even mention threads, that's all
 platform-dependent.  POSIX defines thread semantics for file I/O, but
 fat lot of good that does you on Windows, etc.

Fair enough.  I didn't consider Windows at all or other non-POSIX platforms.  

 
  I don't see anything in the library manual, which is where I expected to
  find it.  It looks like read and write are atomic by virtue of fread
  and fwrite being atomic.
 
 I wouldn't consider this as more than CPython implementation accidents
 in the cases it appears to apply.  For example, in universal-newlines
 mode, are you sure f.read(n) always maps to exactly one fread() call?

Universal newline reads and get_line() both lock the stream if the
platform supports it.  So I expect that they are atomic on those
platforms.

But it certainly seems safe to conclude this is a quality of
implementation issue.  Otherwise, why bother with the flockfile() at
all, right?  Or is there some correctness issue I'm not seeing that
requires the locking for some basic safety in the implementation.

 And even using a lock is stupid.
 
 ZODB's FileStorage is bristling with locks protecting multi-threaded
 access to file objects, therefore that can't be stupid.  QED

Using a lock seemed like a good idea there and still seems like a good
idea now :-).

jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Aahz
On Thu, Mar 17, 2005, Tim Peters wrote:

 I think Aahz was on target here:
 
 NEVER, NEVER access the same file object from multiple threads, unless
 you're using a lock.
 
 And here he went overboard:
 
 And even using a lock is stupid.
 
 ZODB's FileStorage is bristling with locks protecting multi-threaded
 access to file objects, therefore that can't be stupid.  QED

Heh.  And how much time have you spent debugging race conditions and
such?  That's the thrust of my point, same as we tell people to avoid
locks and use Queue instead.  I know that my statement isn't absolutely
true in the sense that it's possible to make code work that accesses
external objects across threads.  (Which is why I didn't garnish that
part with emphasis.)  But it's still stupid, 95-99% of the time.

Actually, I did skip over one other counter-example: stdout is usually
safe across threads provided one builds up a single string.  Still not
something to rely on.
-- 
Aahz ([EMAIL PROTECTED])   * http://www.pythoncraft.com/

The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code -- 
not in reams of trivial code that bores the reader to death.  --GvR
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Martin v. Löwis
Jeremy Hylton wrote:
Are the thread semantics for file objecst documented anywhere?
Literally, the answer to your question is no.
I'm surprised that it does not, for example, guarantee that reads and
writes are atomic, since CPython relies on fread and fwrite which are
atomic.
Where is the connection? Why would anything that CPython requires from
the C library have any effect on Python's documentation?
The only effect on Python documentation is that anybody writes it.
Nobody cares, so nobody writes documentation.
Remember, you were asking what behaviour is *documented*, not what
behaviour is guaranteed by the implementation (in a specific version
of the implementation).
Also, there are other operations that go to the trouble of calling
flockfile().  What's the point if we don't provide any guarantees?
Because nobody cares about guarantees in the documentation. Instead,
people care about observable behaviour. So if you get a crash due to a
race condition, you care, you report a bug, the Python developer agrees
its a bug, and fixes it by adding synchronization.
Nobody reported a bug to the Python documentation.
0.6 wink.  If it is not part of the specified behavior, then I
suppose it's a quality of implementation issue.  Either way it would
be helpful if the Python documentation said something, e.g. you can
rely on readline() being threadsafe or you can't but the current
CPython implementation happens to be.
It would be helpful to whom? To you? I doubt this, as you will be
the one who writes the documentation :-)
readline() seemed like an interesting case because readlines() doesn't
have the same implementation and the behavior is different.  So, as
another example, you could ask whether readlines() has a bug or not.
Nobody knows. It depends on the Python developer who reviews the bug
report. Most likely, he considers it tricky and leaves it open for
somebody else. If his name is Martin, he will find that this is not
a bug (because it does not cause a crash, and does not contradict with
the documentation), and he will reclassify it as a wishlist item. If
his name is Tim, and if he has a good day, he will fix it, and add
a comment on floating point numbers.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Tim Peters
[Jeremy Hylton]
...
 Universal newline reads and get_line() both lock the stream if the
 platform supports it.  So I expect that they are atomic on those
 platforms.

Well, certainly not get_line().  That locks and unlocks the stream
_inside_ an enclosing for-loop.  Looks quite possible for different
threads to read different parts of the same line if multiple threads
are trying to do get_line() simultaneously.  It releases the GIL
inside the for-loop too, so other threads _can_ sneak in.

We put a lot of work into speeding those getc()-in-a-loop functions. 
There was undocumented agreement at the time that they should be
thread-safe in this sense:  provided the platform C stdio wasn't
thread-braindead, then if you had N threads all simultaneously reading
a file object containing B bytes, while nobody wrote to that file
object, then the total number of bytes seen by all N threads would sum
to B at the time they all saw EOF.  This was a much stronger guarantee
than Perl provided at the time (and, for all I know, still provides),
and we (at least I) wrote little test programs at the time
demonstrating that the total number of bytes Perl saw in this case was
unpredictable, while Python's did sum to B.

Of course Perl didn't document any of this either, and it Pythonland
was clearly specific to the horrid tricks in CPython's fileobject.c.

 But it certainly seems safe to conclude this is a quality of
 implementation issue.

Or a sheer pigheadness-of-implementor issue wink.

  Otherwise, why bother with the flockfile() at all, right?  Or is there some
 correctness issue I'm not seeing that requires the locking for some basic
 safety in the implementation.

There are correctness issues, but we still ignore them; locking
relieves, but doesn't solve, them.  For example, C doesn't (and POSIX
doesn't either!) define what happens if you mix reads with writes on a
file opened for update unless a file-positioning operation (like seek)
intervenes, and that's pretty easy for threads to run afoul of. 
Python does nothing to stop you from trying, and behavior if you do is
truly all over the map across boxes.  IIRC, one of the multi-threaded
test programs I mentioned above provoked ugly death in the bowels of
MS's I/O libraries when I threw an undisciplined writer thread into
the mix too.  This was reported to MS, and their response was so
don't that -- it's undefined.  Locking the stream at least cuts down
the chance of that happening, although that's not the primary reason
for it.

Heck, we still have a years-open critical bug against segfaults when
one thread tries to close a file object while another threading is
reading from it, right?

 And even using a lock is stupid.

 ZODB's FileStorage is bristling with locks protecting multi-threaded
 access to file objects, therefore that can't be stupid.  QED

 Using a lock seemed like a good idea there and still seems like a good
 idea now :-).

Damn straight, and we're certain it has nothing to do with those large
runs of NUL bytes that sometime overwrite peoples' critical data for
no reason at all wink.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thread semantics for file objects

2005-03-17 Thread Jeremy Hylton
On Thu, 17 Mar 2005 23:57:52 +0100, Martin v. Löwis
[EMAIL PROTECTED] wrote:
 Remember, you were asking what behaviour is *documented*, not what
 behaviour is guaranteed by the implementation (in a specific version
 of the implementation).

Martin,

I think you're trying to find more finesse in my question than I ever
intended.  I intended to ask -- hey, what are the semantics we intend
in this case?  since the documentation doesn't say, we could improve
them by capturing the intended semantics.

  Also, there are other operations that go to the trouble of calling
  flockfile().  What's the point if we don't provide any guarantees?
 
 Because nobody cares about guarantees in the documentation. Instead,
 people care about observable behaviour. So if you get a crash due to a
 race condition, you care, you report a bug, the Python developer agrees
 its a bug, and fixes it by adding synchronization.

As Tim later reported this wasn't to address a crash, but to appease a
pig headed developer :-).  I'm surprised by your claim that whether
something is a bug depends on the person who reviews it.  In practice,
this may be the case, but I've always been under the impression that
there was rough consensus about what constituted a bug and what a
feature.  I'd certainly say its a goal to strive for.

It sounds like the weakest intended behavior we have is the one Tim
reported:  provided the platform C stdio wasn't thread-braindead,
then if you had N threads all simultaneously reading a file object
containing B bytes, while nobody wrote to that file object, then the
total number of bytes seen by all N threads would sum
to B at the time they all saw EOF.  It seems to me like a good idea
to document this intended behavior somewhere.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com