Re: [Pythonmac-SIG] Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-04 Thread Bob Ippolito
On Jan 4, 2005, at 5:56 AM, Jack Jansen wrote:
On 3 Jan 2005, at 23:40, Bob Ippolito wrote:
Most people on Mac OS X have a lot of memory, and Mac OS X generally 
does a good job about swapping in and out without causing much of a 
problem, so I'm personally not very surprised that it could go 
unnoticed this long.
*Except* when you're low on free disk space. 10.2 and before were 
really bad with this, usually hanging the machine, 10.3 is better but 
it's still pretty bad when compared to other unixen. It probably has 
something to do with the way OSX overcommits memory and swapspace, for 
which it apparently uses a different algorithm than FreeBSD or Linux.

I wouldn't be surprised if the bittorrent problem report in this 
thread was due to being low on diskspace. And that could also be true 
for the original error report that sparked this discussion.
I was able to trigger this bug with a considerable amount of free disk 
space using a laptop that has 1GB of RAM, although I did have to 
increase the buffer size from the given example quite a bit to get it 
to fail.  After all, a 32-bit process can't have more than 4 GB of 
addressable memory.  I am pretty sure that OS X is never supposed to 
overcommit memory.  The disk thrashing probably has a lot to do with 
the fact that Mac OS X will grow and shrink its swap based on demand, 
rather than having a fixed size swap partition as is common on other 
unixen.  I've never seen the problem myself, though.

From what I remember about Linux, its malloc implementation merely 
increases the address space of a process.  The actual allocation will 
happen when you try and access the memory, and if it's overcommitted 
things will fail in a bad way.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 2:16 AM, Tim Peters wrote:
[Bob Ippolito]
...
Your expectation is not correct for Darwin's memory allocation scheme.
It seems that Darwin creates allocations of immutable size.  The only
way ANY part of an allocation will ever be used by ANYTHING else is if
free() is called with that allocation.
Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.
Whatever that means.
 free() can be called either explicitly, or implicitly by calling 
realloc() with
a size larger than the size of the allocation.  In that case, it will 
create a new
allocation of at least the requested size, copy the contents of the
original allocation into the new allocation (probably with
copy-on-write pages if it's large enough, so it might be cheap), and
free() the allocation.
Really?  Another near-universal quality of implementation
expectation is that a growing realloc() will strive to extend
in-place.  Like realloc(malloc(100), 101).  For example, the
theoretical guarantee that one-at-a-time list.append() has amortized
linear time doesn't depend on that, but pragmatically it's greatly
helped by a reasonable growing realloc() implementation.
I said that it created allocations of fixed size, not that it created 
allocations of exactly the size you asked it to.  Yes, it will extend 
in-place for many cases, including the given.

 In the case where realloc() specifies a size that is not greater 
than the
allocation's size, it will simply return the given allocation and 
cause no side-
effects whatsoever.

Was this a good decision?  Probably not!
Sounds more like a bug (or two) to me than a decision, but I don't 
know.
You said yourself that it is standards compliant ;)  I have filed it as 
a bug, but it is probably unlikely to be backported to current versions 
of Mac OS X unless a case can be made that it is indeed a security 
flaw.

 However, it is our (in the I know you use Windows but I am not the 
only
one that uses Mac OS X sense) problem so long as Darwin is a supported
platform, because it is highly unlikely that Apple will backport any 
fix to
the allocator unless we can prove it has some security implications in
software shipped with their OS. ...
Is there any known case where Python performs poorly on this OS, for
this reason, other than the pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes
case?  Claiming rampant performance problems should require evidence
too wink.
Known case?  No.  Do I want to search Python application-space to find 
one?  No.

Presumably this can happen at other places (including third party
extensions), so a better place to do this might be _PyString_Resize().
list_resize() is another reasonable place to put this.  I'm sure there
are other places that use realloc() too, and the majority of them do
this through obmalloc.  So maybe instead of trying to track down all
the places where this can manifest, we should just gunk up Python 
and
patch PyObject_Realloc()?
There is no choke point for allocations in Python -- some places
call the system realloc() directly.  Maybe the latter matter on Darwin
too, but maybe they don't.  The scope of this hack spreads if they do.
 I have no idea how often realloc() is called directly by 3rd-party
extension modules.  It's called directly a lot in Zope's C code, but
AFAICT only to grow vectors, never to shrink them.
In the case of Python, some places means nowhere relevant.  Four 
standard library extension modules relevant to the platform use realloc 
directly:

_sre
Uses realloc only to grow buffers.
cPickle
Uses realloc only to grow buffers.
cStringIO
Uses realloc only to grow buffers.
regexpr:
Uses realloc only to grow buffers.
If Zope doesn't use the allocator that Python gives it, then it can 
deal with its own problems.  I would expect most extensions to use 
Python's allocator.

Since we are both pretty confident that other allocators aren't like 
Darwin,
this gunk can be #ifdef'ed to the __APPLE__ case.
#ifdef's are a last resort:  they almost never go away, so they
complicate the code forever after, and typically stick around for
years even after the platform problems they intended to address have
been fixed.  For obvious reasons, they're also an endless source of
platform-specific bugs.
They're also the only good way to deal with platform-specific 
inconsistencies.  In this specific case, it's not even possible to 
determine if a particular allocator implementation is stupid or not 
without at least using a platform-allocator-specific function to query 
the size reserved by a given allocation.

Note that pymalloc already does a memcpy+free when in
PyObject_Realloc(p, n) p was obtained from the system malloc or
realloc but n is small enough to meet the small object threshold
(pymalloc takes over small blocks that result from a
PyObject_Realloc()).  That's a reasonable strategy *because* n is

Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Guido van Rossum
Coming late to this thread.

I don't see the point of lying awake at night worrying about potential
memory losses unless you've heard someone complain about it. As Tim
has been trying to explain, here are plenty of other things in Python
that we *could* speed up if there was a need; since every speedup
uglifies the code somewhat, we'd end up with very ugly code if we did
them all. Remember, don't optimize prematurely.

Here's one theoretical reason why even with socket.recv() it probably
doesn't matter in practice: the overallocated string will usually be
freed as soon as the data has been parsed from it, and this will free
the overallocation as well!

OTOH, if you want to do more research, checking the usage patterns for
StringRealloc and TupleRealloc would be useful. I could imagine code
in either that makes a copy if the new size is less than some fraction
of the old size. Most code that I recall writing using these tends to
start with a guaranteed-to-fit overallocation, and a single resize at
the end.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread bacchusrx
On Thu, Jan 01, 1970 at 12:00:00AM +, Tim Peters wrote:
 Is there any known case where Python performs poorly on this OS, for
 this reason, other than the pass giant numbers to recv() and then
 shrink the string because we didn't get anywhere near that many bytes
 case?
 
 [...]
 
 I agree the socket-abuse case should be fiddled, and for more reasons
 than just Darwin's realloc() quirks. [...] Yes, in the socket-abuse
 case, where the program routinely malloc()s strings millions of bytes
 larger than the socket can deliver, it would obviously help.  That's
 not typically program behavior (however typical it may be of that
 specific app).

Note that, with respect to http://python.org/sf/1092502, the author of
the (original) program was using the documented interface to a file
object.  It's _fileobject.read() that decides to ask for huge numbers of
bytes from recv() (specifically, in the max(self._rbufsize, left)
condition). Patched to use a fixed recv_size, you of course sidestep the
realloc() nastiness in this particular case.

bacchusrx.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Andrew P. Lentvorski, Jr.
On Jan 2, 2005, at 11:16 PM, Tim Peters wrote:
[Bob Ippolito]
 However, it is our (in the I know you use Windows but I am not the 
only
one that uses Mac OS X sense) problem so long as Darwin is a supported
platform, because it is highly unlikely that Apple will backport any 
fix to
the allocator unless we can prove it has some security implications in
software shipped with their OS. ...
Is there any known case where Python performs poorly on this OS, for
this reason, other than the pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes
case?  Claiming rampant performance problems should require evidence
too wink.
Possibly.  When using the stock btdownloadcurses.py from 
bitconjurer.org,
I occasionally see a memory thrash on OS X.

Normally I have to be in a mode where I am aggregating lots of small
connections (10Kbps or less uploads) into a large download (10Mbps
transfer rate on a 500MB file).  When the file completes, Python sends
OS X into a long-lasting spinning ball of death.  It will emerge after
about 10 minutes or so.
I do not see this same behavior on Linux or FreeBSD.  I never filed a 
bug
because I can't reliably reproduce it (it is dependent upon the upload
characteristics of the torrent swarm).  However, it seems to fit the
bug and diagnosis.

-a
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Tim Peters
[Tim Peters]
 Ya, I understood that.  My conclusion was that Darwin's realloc()
 implementation isn't production-quality.  So it goes.

[Bob Ippolito]
 Whatever that means.

Well, it means what it said.  The C standard says nothing about
performance metrics of any kind, and a production-quality
implementation of C requires very much more than just meeting what the
standard requires.  The phrase quality of implementation is used in
the C Rationale (but not in the standard proper) to cover all such
issues.  realloc() pragmatics are quality-of-implementation issues;
the accuracy of fp arithmetic is another (e.g., if you get back -666.0
from the C 1.0 + 2.0, there's nothing in the standard to justify a
complaint).

  free() can be called either explicitly, or implicitly by calling
 realloc() with a size larger than the size of the allocation.

From later comments feigning outrage wink, I take it that the size
of the allocation here does not mean the specific number the user
passed to the previous malloc/realloc call, but means whatever amount
of address space the implementation decided to use internally.  Sorry,
but I assumed it meant the former at first.

...

 Was this a good decision?  Probably not!

 Sounds more like a bug (or two) to me than a decision, but I don't
 know.

 You said yourself that it is standards compliant ;)  I have filed it as
 a bug, but it is probably unlikely to be backported to current versions
 of Mac OS X unless a case can be made that it is indeed a security
 flaw.

That's plausible.  If you showed me a case where Python's list.sort()
took cubic time, I'd certainly consider that to be a bug, despite
that nothing promises better behavior.  If I wrote a malloc subsystem
and somebody pointed out did you know that when I malloc 1024**2+1
bytes, and then realloc(1), I lose the other megabyte forever?, I'd
consider that to be a bug too (because, docs be damned, I wouldn't
intentionally design a malloc subsystem with such behavior; and
pymalloc does in fact copy bytes on a shrinking realloc in blocks it
controls, whenever at least a quarter of the space is given back --
and it didn't at the start, and I considered that to be a bug when
it was pointed out).

 ...
 Known case?  No.  Do I want to search Python application-space to find
 one?  No.

Serious problems on a platform are usually well-known to users on that
platform.  For example, it was well-known that Python's list-growing
strategy as of a few years ago fragmented address space horribly on
Win9X.  This was a C quality-of-implementation issue specific to that
platform.  It was eventually resolved by improving the list-growing
strategy on all platforms -- although it's still the case that Win9X
does worse on list-growing than other platforms, it's no longer a
disaster for most list-growing apps on Win9X.

If there's a problem with overallocate then realloc() to cut back on
Darwin that affects many apps, then I'd expect Darwin users to know
about that already -- lots of people have used Python on Macs since
Python's beginning, mysterious slowdowns and mysterious bloat get
noticed, and Darwin has been around for a while.

..

 There is no choke point for allocations in Python -- some places
 call the system realloc() directly.  Maybe the latter matter on Darwin
 too, but maybe they don't.  The scope of this hack spreads if they do.

...

 In the case of Python, some places means nowhere relevant.  Four
 standard library extension modules relevant to the platform use realloc
 directly:
 
 _sre
 Uses realloc only to grow buffers.
 cPickle
 Uses realloc only to grow buffers.
 cStringIO
 Uses realloc only to grow buffers.
 regexpr:
 Uses realloc only to grow buffers.

Good!

 If Zope doesn't use the allocator that Python gives it, then it can
 deal with its own problems.  I would expect most extensions to use
 Python's allocator.

I don't know.

...
 
 They're [#ifdef's] also the only good way to deal with platform-specific
 inconsistencies.  In this specific case, it's not even possible to
 determine if a particular allocator implementation is stupid or not
 without at least using a platform-allocator-specific function to query
 the size reserved by a given allocation.

We've had bad experience on several platforms when passing large
numbers to recv().  If that were addressed, it's unclear that Darwin
realloc() behavior would remain a real issue.  OTOH, it is clear that
*just* worming around Darwin realloc() behavior won't help other
platforms with problems in the same *immediate* area of bug 1092502. 
Gross over-allocation followed by a shrinking realloc() just isn't
common in Python.  sock_recv() is an exceptionally bad case.  More
typical is, e.g., fileobject.c's get_line(), where if a line exceed
100 characters the buffer keeps growing by 25% until there's enough
room, then it's cut back once at the end.  That typical use for
shrinking realloc() just isn't going to be implicated in a real
problem -- the 

Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread bacchusrx
On Mon, Jan 03, 2005 at 03:55:19PM -0500, Bob Ippolito wrote:
 Note that, with respect to http://python.org/sf/1092502, the author
 of the (original) program was using the documented interface to a
 file object.  It's _fileobject.read() that decides to ask for huge
 numbers of bytes from recv() (specifically, in the
 max(self._rbufsize, left) condition). Patched to use a fixed
 recv_size, you of course sidestep the realloc() nastiness in this
 particular case.
 
 While using a reasonably sized recv_size is a good idea, using a
 smaller request size simply means that it's less likely that the
 strings will be significantly resized.  It is still highly likely they
 *will* be resized and that doesn't solve the problem that
 over-allocated strings will persist until the entire request is
 fulfilled.

You're right. I should have said, you're more likely to get away with
it. The underlying issue still exists. My point is that the problem is
not analogous to the guy who tried to read 2GB directly from a socket
(as in http://python.org/sf/756104). 

Googling for MemoryError exceptions, you can find a number of spurious
problems on Darwin that are probably due to this bug: SpamBayes for
instance, or the thread at

http://mail.python.org/pipermail/python-list/2004-November/250625.html

bacchusrx.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 4:49 PM, Tim Peters wrote:
[Tim Peters]
Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.
[Bob Ippolito]
Whatever that means.
Well, it means what it said.  The C standard says nothing about
performance metrics of any kind, and a production-quality
implementation of C requires very much more than just meeting what the
standard requires.  The phrase quality of implementation is used in
the C Rationale (but not in the standard proper) to cover all such
issues.  realloc() pragmatics are quality-of-implementation issues;
the accuracy of fp arithmetic is another (e.g., if you get back -666.0
from the C 1.0 + 2.0, there's nothing in the standard to justify a
complaint).
 free() can be called either explicitly, or implicitly by calling
realloc() with a size larger than the size of the allocation.
From later comments feigning outrage wink, I take it that the size
of the allocation here does not mean the specific number the user
passed to the previous malloc/realloc call, but means whatever amount
of address space the implementation decided to use internally.  Sorry,
but I assumed it meant the former at first.
Sorry for the confusion.
Was this a good decision?  Probably not!

Sounds more like a bug (or two) to me than a decision, but I don't
know.

You said yourself that it is standards compliant ;)  I have filed it 
as
a bug, but it is probably unlikely to be backported to current 
versions
of Mac OS X unless a case can be made that it is indeed a security
flaw.
That's plausible.  If you showed me a case where Python's list.sort()
took cubic time, I'd certainly consider that to be a bug, despite
that nothing promises better behavior.  If I wrote a malloc subsystem
and somebody pointed out did you know that when I malloc 1024**2+1
bytes, and then realloc(1), I lose the other megabyte forever?, I'd
consider that to be a bug too (because, docs be damned, I wouldn't
intentionally design a malloc subsystem with such behavior; and
pymalloc does in fact copy bytes on a shrinking realloc in blocks it
controls, whenever at least a quarter of the space is given back --
and it didn't at the start, and I considered that to be a bug when
it was pointed out).
I wouldn't equate until free() is called with forever.  But yes, I 
consider it a bug just as you do, and have reported it appropriately.  
Practically, since it exists in Mac OS X 10.2 and Mac OS X 10.3, and 
may not ever be fixed, we should at least consider it.

...
Known case?  No.  Do I want to search Python application-space to find
one?  No.
Serious problems on a platform are usually well-known to users on that
platform.  For example, it was well-known that Python's list-growing
strategy as of a few years ago fragmented address space horribly on
Win9X.  This was a C quality-of-implementation issue specific to that
platform.  It was eventually resolved by improving the list-growing
strategy on all platforms -- although it's still the case that Win9X
does worse on list-growing than other platforms, it's no longer a
disaster for most list-growing apps on Win9X.
It does take a long time to figure such weird behavior out though.  I 
would have to guess that most people Python users on Darwin have been 
at it for less than 3 years.

The number of people using Python on Darwin who have have written or 
used code that exercised this scenario are determined enough to track 
this sort of thing down is probably very small.

If there's a problem with overallocate then realloc() to cut back on
Darwin that affects many apps, then I'd expect Darwin users to know
about that already -- lots of people have used Python on Macs since
Python's beginning, mysterious slowdowns and mysterious bloat get
noticed, and Darwin has been around for a while.
Most people on Mac OS X have a lot of memory, and Mac OS X generally 
does a good job about swapping in and out without causing much of a 
problem, so I'm personally not very surprised that it could go 
unnoticed this long.

Google says:
Results 1 - 10 of about 1,150 for (darwin OR Mac OR OS X) AND 
MemoryError AND Python.
Results 1 - 10 of about 942 for malloc vm_allocate failed. (0.73 
seconds) 

Of course, in both cases, not all of these can be attributed to 
realloc()'s implementation, but I'm sure some of them can, especially 
the Python ones!

They're [#ifdef's] also the only good way to deal with 
platform-specific
inconsistencies.  In this specific case, it's not even possible to
determine if a particular allocator implementation is stupid or not
without at least using a platform-allocator-specific function to query
the size reserved by a given allocation.
We've had bad experience on several platforms when passing large
numbers to recv().  If that were addressed, it's unclear that Darwin
realloc() behavior would remain a real issue.  OTOH, it is clear that
*just* worming around Darwin realloc() behavior won't help other
platforms with problems in 

[Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-02 Thread Bob Ippolito
Quite a few notable places in the Python sources expect realloc(...) to 
relinquish some memory if the requested size is smaller than the 
currently allocated size.  This is definitely not true on Darwin, and 
possibly other platforms.  I have tested this on OpenBSD and Linux, and 
the implementations on these platforms do appear to relinquish memory, 
but I didn't read the implementation.  I haven't been able to find any 
documentation that states that realloc should make this guarantee, but 
I figure Darwin does this as an optimization and because Darwin 
probably can't resize mmap'ed memory (at least it can't from Python, 
but this probably means it doesn't have this capability at all).

It is possible to fix this for Darwin, because you can ask the 
default malloc zone how big a particular allocation is, and how big an 
allocation of a given size will actually be (see: malloc/malloc.h).  
The obvious place to put this would be PyObject_Realloc, because this 
is at least called by _PyString_Resize (which will fix 
http://python.org/sf/1092502).

Should I write up a patch that fixes this?  I guess the best thing to 
do would be to determine whether the fix should be used at runtime, by 
allocating a meg or so, resizing it to 1 byte, and see if the size of 
the allocation changes.  If the size of the allocation does change, 
then the system realloc can be trusted to do what Python expects it to 
do, otherwise realloc should be done cleanly by allocating a new 
block (returning the original on failure, because it's good enough and 
some places in Python seem to expect that shrink will never fail), 
memcpy, free, return new block.

I wrote up a small hack that does this realloc indirection to CVS 
trunk, and it doesn't seem to cause any measurable difference in 
pystone performance.

Note that all versions of Darwin that I've looked at (6.x, 7.x, and 
8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have 
this issue, but it might go away by Mac OS X 10.4 or some later 
release.

This URL points to the sf bug and Darwin 7.7's realloc(...) 
implementation: 
http://bob.pythonmac.org/archives/2005/01/01/realloc-doesnt/

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-02 Thread Tim Peters
[Bob Ippolito]
 ...
 Your expectation is not correct for Darwin's memory allocation scheme.
 It seems that Darwin creates allocations of immutable size.  The only
 way ANY part of an allocation will ever be used by ANYTHING else is if
 free() is called with that allocation.

Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.

  free() can be called either explicitly, or implicitly by calling realloc() 
 with
 a size larger than the size of the allocation.  In that case, it will create 
 a new
 allocation of at least the requested size, copy the contents of the
 original allocation into the new allocation (probably with
 copy-on-write pages if it's large enough, so it might be cheap), and
 free() the allocation.

Really?  Another near-universal quality of implementation
expectation is that a growing realloc() will strive to extend
in-place.  Like realloc(malloc(100), 101).  For example, the
theoretical guarantee that one-at-a-time list.append() has amortized
linear time doesn't depend on that, but pragmatically it's greatly
helped by a reasonable growing realloc() implementation.

  In the case where realloc() specifies a size that is not greater than the
 allocation's size, it will simply return the given allocation and cause no 
 side-
 effects whatsoever.

 Was this a good decision?  Probably not!

Sounds more like a bug (or two) to me than a decision, but I don't know.

  However, it is our (in the I know you use Windows but I am not the only
 one that uses Mac OS X sense) problem so long as Darwin is a supported
 platform, because it is highly unlikely that Apple will backport any fix to
 the allocator unless we can prove it has some security implications in
 software shipped with their OS. ...

Is there any known case where Python performs poorly on this OS, for
this reason, other than the pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes
case?  Claiming rampant performance problems should require evidence
too wink.

...
 Presumably this can happen at other places (including third party
 extensions), so a better place to do this might be _PyString_Resize().
 list_resize() is another reasonable place to put this.  I'm sure there
 are other places that use realloc() too, and the majority of them do
 this through obmalloc.  So maybe instead of trying to track down all
 the places where this can manifest, we should just gunk up Python and
 patch PyObject_Realloc()?

There is no choke point for allocations in Python -- some places
call the system realloc() directly.  Maybe the latter matter on Darwin
too, but maybe they don't.  The scope of this hack spreads if they do.
 I have no idea how often realloc() is called directly by 3rd-party
extension modules.  It's called directly a lot in Zope's C code, but
AFAICT only to grow vectors, never to shrink them.
'
 Since we are both pretty confident that other allocators aren't like Darwin,
 this gunk can be #ifdef'ed to the __APPLE__ case.

#ifdef's are a last resort:  they almost never go away, so they
complicate the code forever after, and typically stick around for
years even after the platform problems they intended to address have
been fixed.  For obvious reasons, they're also an endless source of
platform-specific bugs.

Note that pymalloc already does a memcpy+free when in
PyObject_Realloc(p, n) p was obtained from the system malloc or
realloc but n is small enough to meet the small object threshold
(pymalloc takes over small blocks that result from a
PyObject_Realloc()).  That's a reasonable strategy *because* n is
always small in such cases.  If you're going to extend this strategy
to n of arbitrary size, then you may also create new performance
problems for some apps on Darwin (copying n bytes can get arbitrarily
expensive).

 ...
  I'm sure I'll find something, but what's important to me is that Python
 works well on Mac OS X, so something should happen.

I agree the socket-abuse case should be fiddled, and for more reasons
than just Darwin's realloc() quirks.  I don't know that there are
actual problems on Darwin broader than that case (and I'm not
challenging you to contrive one, I'm asking whether realloc() quirks
are suspected in any other case that's known).  Part of what you
demonstrated when you said that pystone didn't slow down when you
fiddled stuff is that pystone also didn't speed up.  I also don't know
that the memcpy+free wormaround is actually going to help more than it
hurts overall.  Yes, in the socket-abuse case, where the program
routinely malloc()s strings millions of bytes larger than the socket
can deliver, it would obviously help.  That's not typically program
behavior (however typical it may be of that specific app).  More
typical is shrinking a long list one element at a time, in which case
about half the list remaining would get memcpy'd from time to time
where such copies never get made