Re: [HACKERS] Memory usage during sorting

2012-05-01 Thread Jim Nasby
On 4/17/12 7:19 AM, Greg Stark wrote: On Mon, Apr 16, 2012 at 10:42 PM, Peter Geogheganpe...@2ndquadrant.com wrote: All but 4 regression tests pass, but they don't really count as failures, since they're down to an assumption in the tests that the order certain tuples appear should be

Re: [HACKERS] Memory usage during sorting

2012-04-17 Thread Greg Stark
On Mon, Apr 16, 2012 at 10:42 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: All but 4 regression tests pass, but they don't really count as failures, since they're down to an assumption in the tests that the order certain tuples appear should be the same as our current quicksort

Re: [HACKERS] Memory usage during sorting

2012-04-16 Thread Peter Geoghegan
On 14 April 2012 14:34, Peter Geoghegan pe...@2ndquadrant.com wrote: FWIW, I started playing with adding timsort to Postgres last night: https://github.com/Peter2ndQuadrant/postgres/tree/timsort I've fixed this feature-branch so that every qsort_arg call site (including the tuplesort

Re: [HACKERS] Memory usage during sorting

2012-04-14 Thread Greg Stark
On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Well, timsort is specifically designed to take advantage of pre-sorted data. It does appear to have a lot of traction, as wikipedia points out: I hadn't heard of it. But reading up on it it does seem like a good fit

Re: [HACKERS] Memory usage during sorting

2012-04-14 Thread Peter Geoghegan
On 14 April 2012 13:32, Greg Stark st...@mit.edu wrote: On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Well, timsort is specifically designed to take advantage of pre-sorted data. It does appear to have a lot of traction, as wikipedia points out: I hadn't

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote: Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas robertmh...@gmail.com wrote: ... One thing that seems inefficient to me about our current algorithm is the use of the run number as a leading column in the sort key. There's

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: So has somebody found a hole in the n log n lower bound on the cost of comparison-based sorting?  I thought that had been proven pretty rigorously.

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 16:03, Greg Stark st...@mit.edu wrote: Fwiw it also only holds for comparison sorts. If you can sort your items without actually comparing each item with the others then you aren't bound by it at all. Notably algorithms like radix sort and direct sort aren't bound by it and

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 17:42, Peter Geoghegan pe...@2ndquadrant.com wrote: One insight that I had at the time was that text comparisons where the c locale isn't used are really rather expensive, and I doubt that there is too much that can be done to address that directly.  However, if we were to

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 13 April 2012 17:42, Peter Geoghegan pe...@2ndquadrant.com wrote: One insight that I had at the time was that text comparisons where the c locale isn't used are really rather expensive, and I doubt that there is

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 18:33, Robert Haas robertmh...@gmail.com wrote: We do use insertion sort for partitions of size 7.  I assume you are referring to something else. I stand corrected. My confusion came from the removal of a later *switch* to insertion sort in

Re: [HACKERS] Memory usage during sorting

2012-04-12 Thread Jeff Davis
On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote: Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact reasons why.) We knew

Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Robert Haas
On Tue, Mar 20, 2012 at 6:16 PM, Greg Stark st...@mit.edu wrote: On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote: Frankly that analysis didn't make any sense to me at the time. Comparing integers is fast, sure, but it's still slower than not having to do any comparison

Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Jeff Janes
On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas robertmh...@gmail.com wrote: ... One thing that seems inefficient to me about our current algorithm is the use of the run number as a leading column in the sort key. There's no real reason why the tuples destined for the next run need to be

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: That was a long time ago, of course, but I have some vague recollection that keeping next-run tuples in the current heap achieves a net savings in the total number of comparisons needed to heapify both runs. Offhand I wonder

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Greg Stark st...@mit.edu writes: Offhand I wonder if this is all because we don't have the O(n) heapify implemented. Robert muttered something about that before, but is it real? If you could do that, I'd think you'd have a less-than-n-log-n sorting solution. regards,

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Jeff Janes
On Tue, Mar 20, 2012 at 6:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: Offhand I wonder if this is all because we don't have the O(n) heapify implemented. I think we do already have it implemented. 1/2 the time the tuple stays where it is after one comparison,

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Greg Stark st...@mit.edu writes: http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap Interesting. I'm pretty sure that idea appears nowhere in Knuth (which might mean it's new enough to have a live patent on it ... anybody know who invented this?). But it seems like that should buy back

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote: On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: That was a long time ago, of course, but I have some vague recollection that keeping next-run tuples in the current heap achieves a net savings in the total

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 12:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap Interesting.  I'm pretty sure that idea appears nowhere in Knuth (which might mean it's new enough to have a live patent on it ...

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote: Offhand I wonder if this is all because we don't have the O(n) heapify implemented. I'm pretty sure that's not the problem. Even though our heapify is not as efficient as it could be,

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote: Offhand I wonder if this is all because we don't have the O(n) heapify implemented. I'm pretty sure that's not the

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote: No.  It does the opposite: it slows it down.  This is a highly surprising result but it's quite repeatable: removing comparisons makes it slower.  As previously pontificated, I think this is probably because the heap can

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 3:41 PM, Greg Stark st...@mit.edu wrote: On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote: No.  It does the opposite: it slows it down.  This is a highly surprising result but it's quite repeatable: removing comparisons makes it slower.  As

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Jim Nasby
On 3/18/12 10:25 AM, Tom Lane wrote: Jeff Janesjeff.ja...@gmail.com writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com wrote: On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janesjeff.ja...@gmail.com wrote: Anyway, I think the logtape could use redoing. The problem

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Yeah, I think I'm going to try implementing quicksort-the-whole-batch-and-dump-it-out-as-a-run algorithm, just to see how good or bad that is compared to what we have now. We may not end up doing anything that remotely resembles that, in the end, but

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote: Frankly that analysis didn't make any sense to me at the time. Comparing integers is fast, sure, but it's still slower than not having to do any comparison at all. I think you're underestimating how much it costs to

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: So has somebody found a hole in the n log n lower bound on the cost of comparison-based sorting?  I thought that had been proven pretty rigorously. There's not much danger of anyone finding a way around that bound since the

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote: There's no real reason why the tuples destined for the next run need to be maintained in heap order; we could just store them unordered and heapify the whole lot of them when it's time to start the next run. This sounded

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote: There's no real reason why the tuples destined for the next run need to be maintained in heap order; we could just store them unordered and heapify the whole lot of them when it's time to

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote: Anyway, I think the logtape could use redoing. The problem there is that none of the files can be deleted

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeremy Harris
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janesjeff.ja...@gmail.com writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Tom Lane
Jeremy Harris j...@wizmail.org writes: On 2012-03-18 15:25, Tom Lane wrote: Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Greg Stark
On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, it's about reducing the number of comparisons needed to maintain the heap property. That sounds very interesting.  I didn't know it was even theoretically possible to do that. So has somebody found a hole in the n log n

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeff Janes
On Sun, Mar 18, 2012 at 8:56 AM, Greg Stark st...@mit.edu wrote: On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, it's about reducing the number of comparisons needed to maintain the heap property. That sounds very interesting.  I didn't know it was even theoretically

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Greg Stark
On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote: But it would mean we have about 1.7x  more runs that need to be merged (for initially random data).  Considering the minimum merge order is 6, that increase in runs is likely not to lead to an additional level of merging,

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes
On Sat, Mar 17, 2012 at 10:47 AM, Greg Stark st...@mit.edu wrote: On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote: But it would mean we have about 1.7x  more runs that need to be merged (for initially random data).  Considering the minimum merge order is 6, that

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote: Anyway, I think the logtape could use redoing.  When your tapes are actually physically tape drives, it is necessary to build up runs one after the

Re: [HACKERS] Memory usage during sorting

2012-03-07 Thread Robert Haas
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote: The better solution would be to reduce the overhead in the first place.  While building the initial runs, there is no reason to have 3 blocks worth of overhead for each tape, when only one tape is ever being used at a time.

Re: [HACKERS] Memory usage during sorting

2012-03-03 Thread Jeff Janes
On Sun, Feb 26, 2012 at 7:20 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote: I'm not sure about the conclusion, but given this discussion, I'm inclined to mark this Returned with Feedback. OK, thanks.  Does anyone have

Re: [HACKERS] Memory usage during sorting

2012-02-26 Thread Robert Haas
On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote: I'm not sure about the conclusion, but given this discussion, I'm inclined to mark this Returned with Feedback. OK, thanks.  Does anyone have additional feed-back on how tightly we wish to manage memory usage?  Is trying

Re: [HACKERS] Memory usage during sorting

2012-02-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: A quick Google search for external sorting algorithms suggest that the typical way of doing an external sort is to read data until you fill your in-memory buffer, quicksort it, and dump it out as a run. Repeat until end-of-data; then, merge the runs

Re: [HACKERS] Memory usage during sorting

2012-02-25 Thread Jeff Janes
On Tue, Feb 14, 2012 at 1:44 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:

Re: [HACKERS] Memory usage during sorting

2012-02-14 Thread Hitoshi Harada
On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote: The attached patch allows it to reuse that memory.  On my meager system

Re: [HACKERS] Memory usage during sorting

2012-02-11 Thread Jeff Janes
On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote: The attached patch allows it to reuse that memory.  On my meager system it reduced the building of an index on an integer column in a skinny 200

Re: [HACKERS] Memory usage during sorting

2012-02-08 Thread Hitoshi Harada
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote: The attached patch allows it to reuse that memory.  On my meager system it reduced the building of an index on an integer column in a skinny 200 million row totally randomly ordered table by about 3% from a baseline of 25

Re: [HACKERS] Memory usage during sorting

2012-02-08 Thread Hitoshi Harada
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: The worst thing about the current memory usage is probably that big machines can't qsort more than 16,777,216 tuples no matter how much memory they have, because memtuples has to live entirely within a single allocation,

Re: [HACKERS] Memory usage during sorting

2012-02-04 Thread Jeff Janes
On Sat, Jan 21, 2012 at 4:51 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote: I think it would be better to pre-deduct the tape overhead amount we will need if we decide to switch to tape sort from the availMem before we even

Re: [HACKERS] Memory usage during sorting

2012-02-04 Thread Jeff Janes
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes jeff.ja...@gmail.com wrote: Attached is a completely uncommitable proof of concept/work in progress patch to get around the limitation.  It shows a 2 fold improvement when indexing an integer column on a 50,000,000 row randomly ordered table. Oops,

Re: [HACKERS] Memory usage during sorting

2012-01-21 Thread Peter Geoghegan
On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote: I think it would be better to pre-deduct the tape overhead amount we will need if we decide to switch to tape sort from the availMem before we even start reading (and then add it back if we do indeed make that switch).  That way

[HACKERS] Memory usage during sorting

2012-01-15 Thread Jeff Janes
In tuplesort.c, it initially reads tuples into memory until availMem is exhausted. It then switches to the tape sort algorithm, and allocates buffer space for each tape it will use. This substantially over-runs the allowedMem, and drives availMem negative. It works off this deficit by writing