On 4/17/12 7:19 AM, Greg Stark wrote:
On Mon, Apr 16, 2012 at 10:42 PM, Peter Geogheganpe...@2ndquadrant.com wrote:
All but 4 regression tests pass, but they don't really count
as failures, since they're down to an assumption in the tests that the
order certain tuples appear should be
On Mon, Apr 16, 2012 at 10:42 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
All but 4 regression tests pass, but they don't really count
as failures, since they're down to an assumption in the tests that the
order certain tuples appear should be the same as our current
quicksort
On 14 April 2012 14:34, Peter Geoghegan pe...@2ndquadrant.com wrote:
FWIW, I started playing with adding timsort to Postgres last night:
https://github.com/Peter2ndQuadrant/postgres/tree/timsort
I've fixed this feature-branch so that every qsort_arg call site
(including the tuplesort
On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Well, timsort is specifically designed to take advantage of pre-sorted
data. It does appear to have a lot of traction, as wikipedia points
out:
I hadn't heard of it. But reading up on it it does seem like a good
fit
On 14 April 2012 13:32, Greg Stark st...@mit.edu wrote:
On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
Well, timsort is specifically designed to take advantage of pre-sorted
data. It does appear to have a lot of traction, as wikipedia points
out:
I hadn't
Jeff Davis pg...@j-davis.com writes:
On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote:
Yeah, that was me, and it came out of actual user complaints ten or more
years back. (It's actually not 2X growth but more like 4X growth
according to the comments in logtape.c, though I no longer remember
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas robertmh...@gmail.com wrote:
...
One thing that seems inefficient to me about our current algorithm is
the use of the run number as a leading column in the sort key.
There's
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote:
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
So has somebody found a hole in the n log n lower bound on the cost of
comparison-based sorting? I thought that had been proven pretty
rigorously.
On 13 April 2012 16:03, Greg Stark st...@mit.edu wrote:
Fwiw it also only holds for comparison sorts. If you can sort your
items without actually comparing each item with the others then you
aren't bound by it at all. Notably algorithms like radix sort and
direct sort aren't bound by it and
On 13 April 2012 17:42, Peter Geoghegan pe...@2ndquadrant.com wrote:
One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is too much that can be done to address that directly. However,
if we were to
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 13 April 2012 17:42, Peter Geoghegan pe...@2ndquadrant.com wrote:
One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is
On 13 April 2012 18:33, Robert Haas robertmh...@gmail.com wrote:
We do use insertion sort for partitions of size 7. I assume you are
referring to something else.
I stand corrected. My confusion came from the removal of a later
*switch* to insertion sort in
On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote:
Yeah, that was me, and it came out of actual user complaints ten or more
years back. (It's actually not 2X growth but more like 4X growth
according to the comments in logtape.c, though I no longer remember the
exact reasons why.) We knew
On Tue, Mar 20, 2012 at 6:16 PM, Greg Stark st...@mit.edu wrote:
On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
Frankly that analysis didn't make any sense to me at the time.
Comparing integers is fast, sure, but it's still slower than not
having to do any comparison
On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas robertmh...@gmail.com wrote:
...
One thing that seems inefficient to me about our current algorithm is
the use of the run number as a leading column in the sort key.
There's no real reason why the tuples destined for the next run need
to be
On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
That was a long time ago, of course, but I have some vague recollection
that keeping next-run tuples in the current heap achieves a net savings
in the total number of comparisons needed to heapify both runs.
Offhand I wonder
Greg Stark st...@mit.edu writes:
Offhand I wonder if this is all because we don't have the O(n) heapify
implemented.
Robert muttered something about that before, but is it real? If you
could do that, I'd think you'd have a less-than-n-log-n sorting
solution.
regards,
http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 20, 2012 at 6:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
Offhand I wonder if this is all because we don't have the O(n) heapify
implemented.
I think we do already have it implemented. 1/2 the time the tuple
stays where it is after one comparison,
Greg Stark st...@mit.edu writes:
http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap
Interesting. I'm pretty sure that idea appears nowhere in Knuth
(which might mean it's new enough to have a live patent on it ...
anybody know who invented this?). But it seems like that should buy
back
On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote:
On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
That was a long time ago, of course, but I have some vague recollection
that keeping next-run tuples in the current heap achieves a net savings
in the total
On Tue, Mar 20, 2012 at 12:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap
Interesting. I'm pretty sure that idea appears nowhere in Knuth
(which might mean it's new enough to have a live patent on it ...
Robert Haas robertmh...@gmail.com writes:
On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote:
Offhand I wonder if this is all because we don't have the O(n) heapify
implemented.
I'm pretty sure that's not the problem. Even though our heapify is
not as efficient as it could be,
On Tue, Mar 20, 2012 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark st...@mit.edu wrote:
Offhand I wonder if this is all because we don't have the O(n) heapify
implemented.
I'm pretty sure that's not the
On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:
No. It does the opposite: it slows it down. This is a highly
surprising result but it's quite repeatable: removing comparisons
makes it slower. As previously pontificated, I think this is probably
because the heap can
On Tue, Mar 20, 2012 at 3:41 PM, Greg Stark st...@mit.edu wrote:
On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:
No. It does the opposite: it slows it down. This is a highly
surprising result but it's quite repeatable: removing comparisons
makes it slower. As
On 3/18/12 10:25 AM, Tom Lane wrote:
Jeff Janesjeff.ja...@gmail.com writes:
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com wrote:
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janesjeff.ja...@gmail.com wrote:
Anyway, I think the logtape could use redoing.
The problem
Robert Haas robertmh...@gmail.com writes:
Yeah, I think I'm going to try implementing
quicksort-the-whole-batch-and-dump-it-out-as-a-run algorithm, just to
see how good or bad that is compared to what we have now. We may not
end up doing anything that remotely resembles that, in the end, but
On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
Frankly that analysis didn't make any sense to me at the time.
Comparing integers is fast, sure, but it's still slower than not
having to do any comparison at all.
I think you're underestimating how much it costs to
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
So has somebody found a hole in the n log n lower bound on the cost of
comparison-based sorting? I thought that had been proven pretty
rigorously.
There's not much danger of anyone finding a way around that bound
since the
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote:
There's no real reason why the tuples destined for the next run need
to be maintained in heap order; we could just store them unordered and
heapify the whole lot of them when it's time to start the next run.
This sounded
Greg Stark st...@mit.edu writes:
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote:
There's no real reason why the tuples destined for the next run need
to be maintained in heap order; we could just store them unordered and
heapify the whole lot of them when it's time to
Jeff Janes jeff.ja...@gmail.com writes:
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote:
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
Anyway, I think the logtape could use redoing.
The problem there is that none of the files can be deleted
On 2012-03-18 15:25, Tom Lane wrote:
Jeff Janesjeff.ja...@gmail.com writes:
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com wrote:
The problem there is that none of the files can be deleted until it
was entirely read, so you end up with all the data on disk twice. I
don't
Jeremy Harris j...@wizmail.org writes:
On 2012-03-18 15:25, Tom Lane wrote:
Yeah, that was me, and it came out of actual user complaints ten or more
years back. (It's actually not 2X growth but more like 4X growth
according to the comments in logtape.c, though I no longer remember the
exact
On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
No, it's about reducing the number of comparisons needed to maintain
the heap property.
That sounds very interesting. I didn't know it was even theoretically
possible to do that.
So has somebody found a hole in the n log n
On Sun, Mar 18, 2012 at 8:56 AM, Greg Stark st...@mit.edu wrote:
On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
No, it's about reducing the number of comparisons needed to maintain
the heap property.
That sounds very interesting. I didn't know it was even theoretically
On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote:
But it would mean we have about 1.7x more runs that need to be merged
(for initially random data). Considering the minimum merge order is
6, that increase in runs is likely not to lead to an additional level
of merging,
On Sat, Mar 17, 2012 at 10:47 AM, Greg Stark st...@mit.edu wrote:
On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote:
But it would mean we have about 1.7x more runs that need to be merged
(for initially random data). Considering the minimum merge order is
6, that
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote:
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
Anyway, I think the logtape could use redoing. When your tapes are
actually physically tape drives, it is necessary to build up runs one
after the
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
The better solution would be to reduce the overhead in the first
place. While building the initial runs, there is no reason to have 3
blocks worth of overhead for each tape, when only one tape is ever
being used at a time.
On Sun, Feb 26, 2012 at 7:20 PM, Robert Haas robertmh...@gmail.com wrote:
On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote:
I'm not sure about the conclusion, but given this discussion, I'm
inclined to mark this Returned with Feedback.
OK, thanks. Does anyone have
On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote:
I'm not sure about the conclusion, but given this discussion, I'm
inclined to mark this Returned with Feedback.
OK, thanks. Does anyone have additional feed-back on how tightly we
wish to manage memory usage? Is trying
Robert Haas robertmh...@gmail.com writes:
A quick Google search for external sorting algorithms suggest that the
typical way of doing an external sort is to read data until you fill
your in-memory buffer, quicksort it, and dump it out as a run. Repeat
until end-of-data; then, merge the runs
On Tue, Feb 14, 2012 at 1:44 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes jeff.ja...@gmail.com wrote:
On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes jeff.ja...@gmail.com wrote:
On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:
The attached patch allows it to reuse that memory. On my meager
system
On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:
The attached patch allows it to reuse that memory. On my meager
system it reduced the building of an index on an integer column in a
skinny 200
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:
The attached patch allows it to reuse that memory. On my meager
system it reduced the building of an index on an integer column in a
skinny 200 million row totally randomly ordered table by about 3% from
a baseline of 25
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes jeff.ja...@gmail.com wrote:
The worst thing about the current memory usage is probably that big
machines can't qsort more than 16,777,216 tuples no matter how much
memory they have, because memtuples has to live entirely within a
single allocation,
On Sat, Jan 21, 2012 at 4:51 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote:
I think it would be better to pre-deduct the tape overhead amount we
will need if we decide to switch to tape sort from the availMem before
we even
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes jeff.ja...@gmail.com wrote:
Attached is a completely uncommitable proof of concept/work in
progress patch to get around the limitation. It shows a 2 fold
improvement when indexing an integer column on a 50,000,000 row
randomly ordered table.
Oops,
On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote:
I think it would be better to pre-deduct the tape overhead amount we
will need if we decide to switch to tape sort from the availMem before
we even start reading (and then add it back if we do indeed make that
switch). That way
In tuplesort.c, it initially reads tuples into memory until availMem
is exhausted.
It then switches to the tape sort algorithm, and allocates buffer
space for each tape it will use. This substantially over-runs the
allowedMem, and drives availMem negative.
It works off this deficit by writing
53 matches
Mail list logo