Re: [Pythonmac-SIG] Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Jan 4, 2005, at 5:56 AM, Jack Jansen wrote: On 3 Jan 2005, at 23:40, Bob Ippolito wrote: Most people on Mac OS X have a lot of memory, and Mac OS X generally does a good job about swapping in and out without causing much of a problem, so I'm personally not very surprised that it could go unnoticed this long. *Except* when you're low on free disk space. 10.2 and before were really bad with this, usually hanging the machine, 10.3 is better but it's still pretty bad when compared to other unixen. It probably has something to do with the way OSX overcommits memory and swapspace, for which it apparently uses a different algorithm than FreeBSD or Linux. I wouldn't be surprised if the bittorrent problem report in this thread was due to being low on diskspace. And that could also be true for the original error report that sparked this discussion. I was able to trigger this bug with a considerable amount of free disk space using a laptop that has 1GB of RAM, although I did have to increase the buffer size from the given example quite a bit to get it to fail. After all, a 32-bit process can't have more than 4 GB of addressable memory. I am pretty sure that OS X is never supposed to overcommit memory. The disk thrashing probably has a lot to do with the fact that Mac OS X will grow and shrink its swap based on demand, rather than having a fixed size swap partition as is common on other unixen. I've never seen the problem myself, though. From what I remember about Linux, its malloc implementation merely increases the address space of a process. The actual allocation will happen when you try and access the memory, and if it's overcommitted things will fail in a bad way. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Jan 3, 2005, at 2:16 AM, Tim Peters wrote: [Bob Ippolito] ... Your expectation is not correct for Darwin's memory allocation scheme. It seems that Darwin creates allocations of immutable size. The only way ANY part of an allocation will ever be used by ANYTHING else is if free() is called with that allocation. Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. Whatever that means. free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. In that case, it will create a new allocation of at least the requested size, copy the contents of the original allocation into the new allocation (probably with copy-on-write pages if it's large enough, so it might be cheap), and free() the allocation. Really? Another near-universal quality of implementation expectation is that a growing realloc() will strive to extend in-place. Like realloc(malloc(100), 101). For example, the theoretical guarantee that one-at-a-time list.append() has amortized linear time doesn't depend on that, but pragmatically it's greatly helped by a reasonable growing realloc() implementation. I said that it created allocations of fixed size, not that it created allocations of exactly the size you asked it to. Yes, it will extend in-place for many cases, including the given. In the case where realloc() specifies a size that is not greater than the allocation's size, it will simply return the given allocation and cause no side- effects whatsoever. Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. You said yourself that it is standards compliant ;) I have filed it as a bug, but it is probably unlikely to be backported to current versions of Mac OS X unless a case can be made that it is indeed a security flaw. However, it is our (in the I know you use Windows but I am not the only one that uses Mac OS X sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any fix to the allocator unless we can prove it has some security implications in software shipped with their OS. ... Is there any known case where Python performs poorly on this OS, for this reason, other than the pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes case? Claiming rampant performance problems should require evidence too wink. Known case? No. Do I want to search Python application-space to find one? No. Presumably this can happen at other places (including third party extensions), so a better place to do this might be _PyString_Resize(). list_resize() is another reasonable place to put this. I'm sure there are other places that use realloc() too, and the majority of them do this through obmalloc. So maybe instead of trying to track down all the places where this can manifest, we should just gunk up Python and patch PyObject_Realloc()? There is no choke point for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. I have no idea how often realloc() is called directly by 3rd-party extension modules. It's called directly a lot in Zope's C code, but AFAICT only to grow vectors, never to shrink them. In the case of Python, some places means nowhere relevant. Four standard library extension modules relevant to the platform use realloc directly: _sre Uses realloc only to grow buffers. cPickle Uses realloc only to grow buffers. cStringIO Uses realloc only to grow buffers. regexpr: Uses realloc only to grow buffers. If Zope doesn't use the allocator that Python gives it, then it can deal with its own problems. I would expect most extensions to use Python's allocator. Since we are both pretty confident that other allocators aren't like Darwin, this gunk can be #ifdef'ed to the __APPLE__ case. #ifdef's are a last resort: they almost never go away, so they complicate the code forever after, and typically stick around for years even after the platform problems they intended to address have been fixed. For obvious reasons, they're also an endless source of platform-specific bugs. They're also the only good way to deal with platform-specific inconsistencies. In this specific case, it's not even possible to determine if a particular allocator implementation is stupid or not without at least using a platform-allocator-specific function to query the size reserved by a given allocation. Note that pymalloc already does a memcpy+free when in PyObject_Realloc(p, n) p was obtained from the system malloc or realloc but n is small enough to meet the small object threshold (pymalloc takes over small blocks that result from a PyObject_Realloc()). That's a reasonable strategy *because* n is
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
Coming late to this thread. I don't see the point of lying awake at night worrying about potential memory losses unless you've heard someone complain about it. As Tim has been trying to explain, here are plenty of other things in Python that we *could* speed up if there was a need; since every speedup uglifies the code somewhat, we'd end up with very ugly code if we did them all. Remember, don't optimize prematurely. Here's one theoretical reason why even with socket.recv() it probably doesn't matter in practice: the overallocated string will usually be freed as soon as the data has been parsed from it, and this will free the overallocation as well! OTOH, if you want to do more research, checking the usage patterns for StringRealloc and TupleRealloc would be useful. I could imagine code in either that makes a copy if the new size is less than some fraction of the old size. Most code that I recall writing using these tends to start with a guaranteed-to-fit overallocation, and a single resize at the end. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Thu, Jan 01, 1970 at 12:00:00AM +, Tim Peters wrote: Is there any known case where Python performs poorly on this OS, for this reason, other than the pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes case? [...] I agree the socket-abuse case should be fiddled, and for more reasons than just Darwin's realloc() quirks. [...] Yes, in the socket-abuse case, where the program routinely malloc()s strings millions of bytes larger than the socket can deliver, it would obviously help. That's not typically program behavior (however typical it may be of that specific app). Note that, with respect to http://python.org/sf/1092502, the author of the (original) program was using the documented interface to a file object. It's _fileobject.read() that decides to ask for huge numbers of bytes from recv() (specifically, in the max(self._rbufsize, left) condition). Patched to use a fixed recv_size, you of course sidestep the realloc() nastiness in this particular case. bacchusrx. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Jan 2, 2005, at 11:16 PM, Tim Peters wrote: [Bob Ippolito] However, it is our (in the I know you use Windows but I am not the only one that uses Mac OS X sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any fix to the allocator unless we can prove it has some security implications in software shipped with their OS. ... Is there any known case where Python performs poorly on this OS, for this reason, other than the pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes case? Claiming rampant performance problems should require evidence too wink. Possibly. When using the stock btdownloadcurses.py from bitconjurer.org, I occasionally see a memory thrash on OS X. Normally I have to be in a mode where I am aggregating lots of small connections (10Kbps or less uploads) into a large download (10Mbps transfer rate on a 500MB file). When the file completes, Python sends OS X into a long-lasting spinning ball of death. It will emerge after about 10 minutes or so. I do not see this same behavior on Linux or FreeBSD. I never filed a bug because I can't reliably reproduce it (it is dependent upon the upload characteristics of the torrent swarm). However, it seems to fit the bug and diagnosis. -a ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
[Tim Peters] Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. [Bob Ippolito] Whatever that means. Well, it means what it said. The C standard says nothing about performance metrics of any kind, and a production-quality implementation of C requires very much more than just meeting what the standard requires. The phrase quality of implementation is used in the C Rationale (but not in the standard proper) to cover all such issues. realloc() pragmatics are quality-of-implementation issues; the accuracy of fp arithmetic is another (e.g., if you get back -666.0 from the C 1.0 + 2.0, there's nothing in the standard to justify a complaint). free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. From later comments feigning outrage wink, I take it that the size of the allocation here does not mean the specific number the user passed to the previous malloc/realloc call, but means whatever amount of address space the implementation decided to use internally. Sorry, but I assumed it meant the former at first. ... Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. You said yourself that it is standards compliant ;) I have filed it as a bug, but it is probably unlikely to be backported to current versions of Mac OS X unless a case can be made that it is indeed a security flaw. That's plausible. If you showed me a case where Python's list.sort() took cubic time, I'd certainly consider that to be a bug, despite that nothing promises better behavior. If I wrote a malloc subsystem and somebody pointed out did you know that when I malloc 1024**2+1 bytes, and then realloc(1), I lose the other megabyte forever?, I'd consider that to be a bug too (because, docs be damned, I wouldn't intentionally design a malloc subsystem with such behavior; and pymalloc does in fact copy bytes on a shrinking realloc in blocks it controls, whenever at least a quarter of the space is given back -- and it didn't at the start, and I considered that to be a bug when it was pointed out). ... Known case? No. Do I want to search Python application-space to find one? No. Serious problems on a platform are usually well-known to users on that platform. For example, it was well-known that Python's list-growing strategy as of a few years ago fragmented address space horribly on Win9X. This was a C quality-of-implementation issue specific to that platform. It was eventually resolved by improving the list-growing strategy on all platforms -- although it's still the case that Win9X does worse on list-growing than other platforms, it's no longer a disaster for most list-growing apps on Win9X. If there's a problem with overallocate then realloc() to cut back on Darwin that affects many apps, then I'd expect Darwin users to know about that already -- lots of people have used Python on Macs since Python's beginning, mysterious slowdowns and mysterious bloat get noticed, and Darwin has been around for a while. .. There is no choke point for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. ... In the case of Python, some places means nowhere relevant. Four standard library extension modules relevant to the platform use realloc directly: _sre Uses realloc only to grow buffers. cPickle Uses realloc only to grow buffers. cStringIO Uses realloc only to grow buffers. regexpr: Uses realloc only to grow buffers. Good! If Zope doesn't use the allocator that Python gives it, then it can deal with its own problems. I would expect most extensions to use Python's allocator. I don't know. ... They're [#ifdef's] also the only good way to deal with platform-specific inconsistencies. In this specific case, it's not even possible to determine if a particular allocator implementation is stupid or not without at least using a platform-allocator-specific function to query the size reserved by a given allocation. We've had bad experience on several platforms when passing large numbers to recv(). If that were addressed, it's unclear that Darwin realloc() behavior would remain a real issue. OTOH, it is clear that *just* worming around Darwin realloc() behavior won't help other platforms with problems in the same *immediate* area of bug 1092502. Gross over-allocation followed by a shrinking realloc() just isn't common in Python. sock_recv() is an exceptionally bad case. More typical is, e.g., fileobject.c's get_line(), where if a line exceed 100 characters the buffer keeps growing by 25% until there's enough room, then it's cut back once at the end. That typical use for shrinking realloc() just isn't going to be implicated in a real problem -- the
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Mon, Jan 03, 2005 at 03:55:19PM -0500, Bob Ippolito wrote: Note that, with respect to http://python.org/sf/1092502, the author of the (original) program was using the documented interface to a file object. It's _fileobject.read() that decides to ask for huge numbers of bytes from recv() (specifically, in the max(self._rbufsize, left) condition). Patched to use a fixed recv_size, you of course sidestep the realloc() nastiness in this particular case. While using a reasonably sized recv_size is a good idea, using a smaller request size simply means that it's less likely that the strings will be significantly resized. It is still highly likely they *will* be resized and that doesn't solve the problem that over-allocated strings will persist until the entire request is fulfilled. You're right. I should have said, you're more likely to get away with it. The underlying issue still exists. My point is that the problem is not analogous to the guy who tried to read 2GB directly from a socket (as in http://python.org/sf/756104). Googling for MemoryError exceptions, you can find a number of spurious problems on Darwin that are probably due to this bug: SpamBayes for instance, or the thread at http://mail.python.org/pipermail/python-list/2004-November/250625.html bacchusrx. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
On Jan 3, 2005, at 4:49 PM, Tim Peters wrote: [Tim Peters] Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. [Bob Ippolito] Whatever that means. Well, it means what it said. The C standard says nothing about performance metrics of any kind, and a production-quality implementation of C requires very much more than just meeting what the standard requires. The phrase quality of implementation is used in the C Rationale (but not in the standard proper) to cover all such issues. realloc() pragmatics are quality-of-implementation issues; the accuracy of fp arithmetic is another (e.g., if you get back -666.0 from the C 1.0 + 2.0, there's nothing in the standard to justify a complaint). free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. From later comments feigning outrage wink, I take it that the size of the allocation here does not mean the specific number the user passed to the previous malloc/realloc call, but means whatever amount of address space the implementation decided to use internally. Sorry, but I assumed it meant the former at first. Sorry for the confusion. Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. You said yourself that it is standards compliant ;) I have filed it as a bug, but it is probably unlikely to be backported to current versions of Mac OS X unless a case can be made that it is indeed a security flaw. That's plausible. If you showed me a case where Python's list.sort() took cubic time, I'd certainly consider that to be a bug, despite that nothing promises better behavior. If I wrote a malloc subsystem and somebody pointed out did you know that when I malloc 1024**2+1 bytes, and then realloc(1), I lose the other megabyte forever?, I'd consider that to be a bug too (because, docs be damned, I wouldn't intentionally design a malloc subsystem with such behavior; and pymalloc does in fact copy bytes on a shrinking realloc in blocks it controls, whenever at least a quarter of the space is given back -- and it didn't at the start, and I considered that to be a bug when it was pointed out). I wouldn't equate until free() is called with forever. But yes, I consider it a bug just as you do, and have reported it appropriately. Practically, since it exists in Mac OS X 10.2 and Mac OS X 10.3, and may not ever be fixed, we should at least consider it. ... Known case? No. Do I want to search Python application-space to find one? No. Serious problems on a platform are usually well-known to users on that platform. For example, it was well-known that Python's list-growing strategy as of a few years ago fragmented address space horribly on Win9X. This was a C quality-of-implementation issue specific to that platform. It was eventually resolved by improving the list-growing strategy on all platforms -- although it's still the case that Win9X does worse on list-growing than other platforms, it's no longer a disaster for most list-growing apps on Win9X. It does take a long time to figure such weird behavior out though. I would have to guess that most people Python users on Darwin have been at it for less than 3 years. The number of people using Python on Darwin who have have written or used code that exercised this scenario are determined enough to track this sort of thing down is probably very small. If there's a problem with overallocate then realloc() to cut back on Darwin that affects many apps, then I'd expect Darwin users to know about that already -- lots of people have used Python on Macs since Python's beginning, mysterious slowdowns and mysterious bloat get noticed, and Darwin has been around for a while. Most people on Mac OS X have a lot of memory, and Mac OS X generally does a good job about swapping in and out without causing much of a problem, so I'm personally not very surprised that it could go unnoticed this long. Google says: Results 1 - 10 of about 1,150 for (darwin OR Mac OR OS X) AND MemoryError AND Python. Results 1 - 10 of about 942 for malloc vm_allocate failed. (0.73 seconds) Of course, in both cases, not all of these can be attributed to realloc()'s implementation, but I'm sure some of them can, especially the Python ones! They're [#ifdef's] also the only good way to deal with platform-specific inconsistencies. In this specific case, it's not even possible to determine if a particular allocator implementation is stupid or not without at least using a platform-allocator-specific function to query the size reserved by a given allocation. We've had bad experience on several platforms when passing large numbers to recv(). If that were addressed, it's unclear that Darwin realloc() behavior would remain a real issue. OTOH, it is clear that *just* worming around Darwin realloc() behavior won't help other platforms with problems in
[Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
Quite a few notable places in the Python sources expect realloc(...) to relinquish some memory if the requested size is smaller than the currently allocated size. This is definitely not true on Darwin, and possibly other platforms. I have tested this on OpenBSD and Linux, and the implementations on these platforms do appear to relinquish memory, but I didn't read the implementation. I haven't been able to find any documentation that states that realloc should make this guarantee, but I figure Darwin does this as an optimization and because Darwin probably can't resize mmap'ed memory (at least it can't from Python, but this probably means it doesn't have this capability at all). It is possible to fix this for Darwin, because you can ask the default malloc zone how big a particular allocation is, and how big an allocation of a given size will actually be (see: malloc/malloc.h). The obvious place to put this would be PyObject_Realloc, because this is at least called by _PyString_Resize (which will fix http://python.org/sf/1092502). Should I write up a patch that fixes this? I guess the best thing to do would be to determine whether the fix should be used at runtime, by allocating a meg or so, resizing it to 1 byte, and see if the size of the allocation changes. If the size of the allocation does change, then the system realloc can be trusted to do what Python expects it to do, otherwise realloc should be done cleanly by allocating a new block (returning the original on failure, because it's good enough and some places in Python seem to expect that shrink will never fail), memcpy, free, return new block. I wrote up a small hack that does this realloc indirection to CVS trunk, and it doesn't seem to cause any measurable difference in pystone performance. Note that all versions of Darwin that I've looked at (6.x, 7.x, and 8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have this issue, but it might go away by Mac OS X 10.4 or some later release. This URL points to the sf bug and Darwin 7.7's realloc(...) implementation: http://bob.pythonmac.org/archives/2005/01/01/realloc-doesnt/ -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
[Bob Ippolito] ... Your expectation is not correct for Darwin's memory allocation scheme. It seems that Darwin creates allocations of immutable size. The only way ANY part of an allocation will ever be used by ANYTHING else is if free() is called with that allocation. Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. In that case, it will create a new allocation of at least the requested size, copy the contents of the original allocation into the new allocation (probably with copy-on-write pages if it's large enough, so it might be cheap), and free() the allocation. Really? Another near-universal quality of implementation expectation is that a growing realloc() will strive to extend in-place. Like realloc(malloc(100), 101). For example, the theoretical guarantee that one-at-a-time list.append() has amortized linear time doesn't depend on that, but pragmatically it's greatly helped by a reasonable growing realloc() implementation. In the case where realloc() specifies a size that is not greater than the allocation's size, it will simply return the given allocation and cause no side- effects whatsoever. Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. However, it is our (in the I know you use Windows but I am not the only one that uses Mac OS X sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any fix to the allocator unless we can prove it has some security implications in software shipped with their OS. ... Is there any known case where Python performs poorly on this OS, for this reason, other than the pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes case? Claiming rampant performance problems should require evidence too wink. ... Presumably this can happen at other places (including third party extensions), so a better place to do this might be _PyString_Resize(). list_resize() is another reasonable place to put this. I'm sure there are other places that use realloc() too, and the majority of them do this through obmalloc. So maybe instead of trying to track down all the places where this can manifest, we should just gunk up Python and patch PyObject_Realloc()? There is no choke point for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. I have no idea how often realloc() is called directly by 3rd-party extension modules. It's called directly a lot in Zope's C code, but AFAICT only to grow vectors, never to shrink them. ' Since we are both pretty confident that other allocators aren't like Darwin, this gunk can be #ifdef'ed to the __APPLE__ case. #ifdef's are a last resort: they almost never go away, so they complicate the code forever after, and typically stick around for years even after the platform problems they intended to address have been fixed. For obvious reasons, they're also an endless source of platform-specific bugs. Note that pymalloc already does a memcpy+free when in PyObject_Realloc(p, n) p was obtained from the system malloc or realloc but n is small enough to meet the small object threshold (pymalloc takes over small blocks that result from a PyObject_Realloc()). That's a reasonable strategy *because* n is always small in such cases. If you're going to extend this strategy to n of arbitrary size, then you may also create new performance problems for some apps on Darwin (copying n bytes can get arbitrarily expensive). ... I'm sure I'll find something, but what's important to me is that Python works well on Mac OS X, so something should happen. I agree the socket-abuse case should be fiddled, and for more reasons than just Darwin's realloc() quirks. I don't know that there are actual problems on Darwin broader than that case (and I'm not challenging you to contrive one, I'm asking whether realloc() quirks are suspected in any other case that's known). Part of what you demonstrated when you said that pystone didn't slow down when you fiddled stuff is that pystone also didn't speed up. I also don't know that the memcpy+free wormaround is actually going to help more than it hurts overall. Yes, in the socket-abuse case, where the program routinely malloc()s strings millions of bytes larger than the socket can deliver, it would obviously help. That's not typically program behavior (however typical it may be of that specific app). More typical is shrinking a long list one element at a time, in which case about half the list remaining would get memcpy'd from time to time where such copies never get made