I assume we never came up with a TODO from this thread:
---
On Tue, Dec 3, 2013 at 11:30:44PM +, Greg Stark wrote:
At multiple conferences I've heard about people trying all sorts of
gymnastics to avoid ANALYZE which
On 12/17/2013 12:06 AM, Jeff Janes wrote:
On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
very easy, patch attached. Your mileage may vary, but I'm seeing a nice
gain from this on my
On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
very easy, patch attached. Your mileage may vary, but I'm seeing a nice
gain from this on my laptop. Taking a 3 page sample of a table
Here's an analysis of Jeff Janes' simple example of a table where our
n_distinct estimate is way off.
On Dec11, 2013, at 00:03 , Jeff Janes jeff.ja...@gmail.com wrote:
create table baz as select floor(random()*1000), md5(random()::text) from
generate_series(1,1);
create table baz2
On Wed, Dec 11, 2013 at 2:33 PM, Greg Stark st...@mit.edu wrote:
I think we're all wet here. I don't see any bias towards larger or smaller
rows. Larger tied will be on a larger number of pages but there will be
fewer of them on any one page. The average effect should be the same.
Smaller
On Thu, Dec 12, 2013 at 6:39 AM, Florian Pflug f...@phlo.org wrote:
Here's an analysis of Jeff Janes' simple example of a table where our
n_distinct estimate is way off.
On Dec11, 2013, at 00:03 , Jeff Janes jeff.ja...@gmail.com wrote:
create table baz as select floor(random()*1000),
Jeff Janes jeff.ja...@gmail.com writes:
It would be relatively easy to fix this if we trusted the number of visible
rows in each block to be fairly constant. But without that assumption, I
don't see a way to fix the sample selection process without reading the
entire table.
Yeah, varying
On Thu, Dec 12, 2013 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Jeff Janes jeff.ja...@gmail.com writes:
It would be relatively easy to fix this if we trusted the number of visible
rows in each block to be fairly constant. But without that assumption, I
don't see a way to fix the sample
On 12/12/2013 10:33 AM, Claudio Freire wrote:
Well, why not take a supersample containing all visible tuples from N
selected blocks, and do bootstrapping over it, with subsamples of M
independent rows each?
Well, we still need to look at each individual block to determine
grouping correlation.
On Thu, Dec 12, 2013 at 3:56 PM, Josh Berkus j...@agliodbs.com wrote:
Estimated grouping should, however, affect MCVs. In cases where we
estimate that grouping levels are high, the expected % of observed
values should be discounted somehow. That is, with total random
distribution you have a
On Thu, Dec 12, 2013 at 10:33 AM, Claudio Freire klaussfre...@gmail.comwrote:
On Thu, Dec 12, 2013 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Jeff Janes jeff.ja...@gmail.com writes:
It would be relatively easy to fix this if we trusted the number of
visible
rows in each block to be
On Thu, Dec 12, 2013 at 4:13 PM, Jeff Janes jeff.ja...@gmail.com wrote:
Well, why not take a supersample containing all visible tuples from N
selected blocks, and do bootstrapping over it, with subsamples of M
independent rows each?
Bootstrapping methods generally do not work well when ties
On Tue, Dec 3, 2013 at 3:30 PM, Greg Stark st...@mit.edu wrote:
At multiple conferences I've heard about people trying all sorts of
gymnastics to avoid ANALYZE which they expect to take too long and
consume too much I/O. This is especially a big complain after upgrades
when their new database
On Dec12, 2013, at 19:29 , Tom Lane t...@sss.pgh.pa.us wrote:
However ... where this thread started was not about trying to reduce
the remaining statistical imperfections in our existing sampling method.
It was about whether we could reduce the number of pages read for an
acceptable cost in
On 11/12/13 19:34, Simon Riggs wrote:
Realistically, I never heard of an Oracle DBA doing advanced
statistical mathematics before setting the sample size on ANALYZE. You
use the default and bump it up if the sample is insufficient for the
data.
I'm not sure that Oracle's stats and optimizer
On Wed, Dec 11, 2013 at 12:58 AM, Simon Riggs si...@2ndquadrant.com wrote:
Yes, it is not a perfect statistical sample. All sampling is subject
to an error that is data dependent.
Well there's random variation due to the limitations of dealing with a
sample. And then there's systemic biases
On Wed, Dec 11, 2013 at 11:01 AM, Greg Stark st...@mit.edu wrote:
I'm not actually sure there is any systemic bias here. The larger
number of rows per block generate less precise results but from my
thought experiments they seem to still be accurate?
So I've done some empirical tests for a
On Wed, Dec 11, 2013 at 12:08 PM, Greg Stark st...@mit.edu wrote:
The only thing I can think
of is maybe the most common elements are being selected preferentially
from the early part of the sample which is removing a substantial part
of the lower end of the range. But even removing 100 from
On 12/11/2013 02:08 PM, Greg Stark wrote:
On Wed, Dec 11, 2013 at 11:01 AM, Greg Stark st...@mit.edu wrote:
I'm not actually sure there is any systemic bias here. The larger
number of rows per block generate less precise results but from my
thought experiments they seem to still be accurate?
On Dec10, 2013, at 15:32 , Claudio Freire klaussfre...@gmail.com wrote:
On Tue, Dec 10, 2013 at 11:02 AM, Greg Stark st...@mit.edu wrote:
On 10 Dec 2013 08:28, Albe Laurenz laurenz.a...@wien.gv.at wrote:
Doesn't all that assume a normally distributed random variable?
I don't think so
On 11 December 2013 12:08, Greg Stark st...@mit.edu wrote:
So there is something clearly wonky in the histogram stats that's
affected by the distribution of the sample.
...in the case where the avg width changes in a consistent manner
across the table.
Well spotted.
ISTM we can have a
Greg Stark st...@mit.edu writes:
So I've done some empirical tests for a table generated by:
create table sizeskew as (select i,j,repeat('i',i) from
generate_series(1,1000) as i, generate_series(1,1000) as j);
I find that using the whole block doesn't cause any problem with the
avg_width
I wrote:
Hm. You can only take N rows from a block if there actually are at least
N rows in the block. So the sampling rule I suppose you are using is
select up to N rows from each sampled block --- and that is going to
favor the contents of blocks containing narrower-than-average rows.
Oh,
On 12/12/13 06:22, Tom Lane wrote:
I wrote:
Hm. You can only take N rows from a block if there actually are at least
N rows in the block. So the sampling rule I suppose you are using is
select up to N rows from each sampled block --- and that is going to
favor the contents of blocks
On 12/12/13 06:22, Tom Lane wrote:
I wrote:
Hm. You can only take N rows from a block if there actually are at least
N rows in the block. So the sampling rule I suppose you are using is
select up to N rows from each sampled block --- and that is going to
favor the contents of blocks
On 12/12/13 07:22, Gavin Flower wrote:
On 12/12/13 06:22, Tom Lane wrote:
I wrote:
Hm. You can only take N rows from a block if there actually are at
least
N rows in the block. So the sampling rule I suppose you are using is
select up to N rows from each sampled block --- and that is going
Gavin Flower gavinflo...@archidevsys.co.nz wrote:
For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages. In the pathologically worst case, assuming
maximum packing density and no page has both types: the large rows would
occupy 500 pages and the smaller
On 12/12/13 08:14, Gavin Flower wrote:
On 12/12/13 07:22, Gavin Flower wrote:
On 12/12/13 06:22, Tom Lane wrote:
I wrote:
Hm. You can only take N rows from a block if there actually are at
least
N rows in the block. So the sampling rule I suppose you are using is
select up to N rows from
On Tue, Dec 10, 2013 at 4:48 PM, Peter Geoghegan p...@heroku.com wrote:
Why would I even mention that to a statistician? We want guidance. But
yes, I bet I could give a statistician an explanation of statistics
target that they'd understand without too much trouble.
Actually, I think that if
On 12/12/13 08:31, Kevin Grittner wrote:
Gavin Flower gavinflo...@archidevsys.co.nz wrote:
For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages. In the pathologically worst case, assuming
maximum packing density and no page has both types: the large rows
On 12/12/13 08:39, Gavin Flower wrote:
On 12/12/13 08:31, Kevin Grittner wrote:
Gavin Flower gavinflo...@archidevsys.co.nz wrote:
For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages. In the pathologically worst case, assuming
maximum packing density and
On 12/12/13 09:12, Gavin Flower wrote:
On 12/12/13 08:39, Gavin Flower wrote:
On 12/12/13 08:31, Kevin Grittner wrote:
Gavin Flower gavinflo...@archidevsys.co.nz wrote:
For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes,
using 400 byte pages. In the pathologically worst
I think we're all wet here. I don't see any bias towards larger or smaller
rows. Larger tied will be on a larger number of pages but there will be
fewer of them on any one page. The average effect should be the same.
Smaller values might have a higher variance with block based sampling than
On Thu, Dec 12, 2013 at 07:22:59AM +1300, Gavin Flower wrote:
Surely we want to sample a 'constant fraction' (obviously, in
practice you have to sample an integral number of rows in a page!)
of rows per page? The simplest way, as Tom suggests, is to use all
the rows in a page.
However, if
On 12/11/2013 02:39 PM, Martijn van Oosterhout wrote:
In this discussion we've mostly used block = 1 postgresql block of 8k.
But when reading from a disk once you've read one block you can
basically read the following ones practically for free.
So I wonder if you could make your sampling
Greg Stark wrote:
It's also applicable for the other stats; histogram buckets constructed
from a 5% sample are more likely to be accurate than those constructed
from a 0.1% sample. Same with nullfrac. The degree of improved
accuracy, would, of course, require some math to determine.
This
On 10 Dec 2013 08:28, Albe Laurenz laurenz.a...@wien.gv.at wrote:
Doesn't all that assume a normally distributed random variable?
I don't think so because of the law of large numbers. If you have a large
population and sample it the sample behaves like a normal distribution when
if the
Greg Stark wrote:
Doesn't all that assume a normally distributed random variable?
I don't think so because of the law of large numbers. If you have a large
population and sample it the
sample behaves like a normal distribution when if the distribution of the
population isn't.
Statistics
On Tue, Dec 10, 2013 at 11:02 AM, Greg Stark st...@mit.edu wrote:
On 10 Dec 2013 08:28, Albe Laurenz laurenz.a...@wien.gv.at wrote:
Doesn't all that assume a normally distributed random variable?
I don't think so because of the law of large numbers. If you have a large
population and
On Tue, Dec 10, 2013 at 11:32 AM, Claudio Freire klaussfre...@gmail.com wrote:
On Tue, Dec 10, 2013 at 11:02 AM, Greg Stark st...@mit.edu wrote:
On 10 Dec 2013 08:28, Albe Laurenz laurenz.a...@wien.gv.at wrote:
Doesn't all that assume a normally distributed random variable?
I don't think
On 6 December 2013 09:21, Andres Freund and...@2ndquadrant.com wrote:
On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help Greg, because his
complaint is mostly that a fresh
On 2013-12-10 19:23:37 +, Simon Riggs wrote:
On 6 December 2013 09:21, Andres Freund and...@2ndquadrant.com wrote:
On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
However, these things presume that we need to continue scanning most
of the blocks of the table, which I don't think needs to be the case.
There is a better way.
Do they? I think it's one opportunistic way of
On 10 December 2013 19:49, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
However, these things presume that we need to continue scanning most
of the blocks of the table, which I don't think needs to be the case.
There is a
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of block sampling would be preferable
for most or all purposes.
As
On Tue, Dec 10, 2013 at 7:54 PM, Josh Berkus j...@agliodbs.com wrote:
As discussed, we need math though. Does anyone have an ACM subscription
and time to do a search? Someone must. We can buy one with community
funds, but no reason to do so if we don't have to.
Anyone in a university likely
On 10 December 2013 19:54, Josh Berkus j...@agliodbs.com wrote:
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of
On Tue, Dec 10, 2013 at 11:59 AM, Greg Stark st...@mit.edu wrote:
But I don't really think this is the right way to go about this.
Research papers are going to turn up pretty specialized solutions that
are probably patented. We don't even have the basic understanding we
need. I suspect a basic
On 12/10/2013 10:00 PM, Simon Riggs wrote:
On 10 December 2013 19:54, Josh Berkus j...@agliodbs.com wrote:
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
I don't think that anyone believes that not doing block sampling
On 11/12/13 09:19, Heikki Linnakangas wrote:
On 12/10/2013 10:00 PM, Simon Riggs wrote:
On 10 December 2013 19:54, Josh Berkus j...@agliodbs.com wrote:
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs
si...@2ndquadrant.com wrote:
I don't think that
On 12/10/2013 01:33 PM, Mark Kirkwood wrote:
Yeah - and we seem to be back to Josh's point about needing 'some math'
to cope with the rows within a block not being a purely random selection.
Well, sometimes they are effectively random. But sometimes they are
not. The Chaudri et al paper had a
On Tue, Dec 10, 2013 at 7:49 PM, Peter Geoghegan p...@heroku.com wrote:
Back in 2005/6, I advocated a block sampling method, as described by
Chaudri et al (ref?)
I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of block sampling would be
On Mon, Dec 9, 2013 at 2:37 PM, Robert Haas robertmh...@gmail.com wrote:
On Mon, Dec 9, 2013 at 4:18 PM, Jeff Janes jeff.ja...@gmail.com wrote:
My reading of the code is that if it is not in the MCV, then it is
assumed
to have the average selectivity (about 1/n_distinct, but deflating top
On 12/10/13 2:17 PM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:59 AM, Greg Stark st...@mit.edu wrote:
But I don't really think this is the right way to go about this.
Research papers are going to turn up pretty specialized solutions that
are probably patented. We don't even have the
On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby j...@nasby.net wrote:
I agree that looking for information on block level sampling
specifically, and its impact on estimation quality is likely to not
turn up very much, and whatever it does turn up will have patent
issues.
We have an entire
On 10 December 2013 23:43, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby j...@nasby.net wrote:
I agree that looking for information on block level sampling
specifically, and its impact on estimation quality is likely to not
turn up very much, and whatever it
On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs si...@2ndquadrant.com wrote:
Block sampling, with parameter to specify sample size. +1
Simon this is very frustrating. Can you define block sampling?
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make
On 11 December 2013 00:28, Greg Stark st...@mit.edu wrote:
On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs si...@2ndquadrant.com
wrote:
Block sampling, with parameter to specify sample size. +1
Simon this is very frustrating. Can you define block sampling?
Blocks selected using Vitter's
On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
When we select a block we should read all rows on that block, to help
identify the extent of clustering within the data.
So how do you interpret the results of the sample read that way that
doesn't introduce bias?
--
On Tue, Dec 10, 2013 at 4:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
err, so what does stats target mean exactly in statistical theory?
Why would I even mention that to a statistician? We want guidance. But
yes, I bet I could give a statistician an explanation of statistics
target that
On 11 December 2013 00:44, Greg Stark st...@mit.edu wrote:
On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
When we select a block we should read all rows on that block, to help
identify the extent of clustering within the data.
So how do you interpret the results of
For what it's worth.
I'll quote Chaudhuri et al. first line from the abstract about the block
sampling.
Block-level sampling is far more efficient than true uniform-random
sampling over a large database, but prone to significant errors if used
to create database statistics.
And after
Peter Geoghegan p...@heroku.com writes:
Again, it isn't as if the likely efficacy of *some* block sampling
approach is in question. I'm sure analyze.c is currently naive about
many things.
It's not *that* naive; this is already about a third-generation algorithm.
The last major revision
On Tuesday, December 10, 2013, Simon Riggs wrote:
On 11 December 2013 00:28, Greg Stark st...@mit.edu javascript:;
wrote:
On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs
si...@2ndquadrant.comjavascript:;
wrote:
Block sampling, with parameter to specify sample size. +1
Simon this is
On 11 December 2013 01:27, Sergey E. Koposov m...@sai.msu.ru wrote:
For what it's worth.
I'll quote Chaudhuri et al. first line from the abstract about the block
sampling.
Block-level sampling is far more efficient than true uniform-random
sampling over a large database, but prone to
On Tue, Dec 10, 2013 at 10:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
On 11 December 2013 01:27, Sergey E. Koposov m...@sai.msu.ru wrote:
For what it's worth.
I'll quote Chaudhuri et al. first line from the abstract about the block
sampling.
Block-level sampling is far more efficient
On 12/11/2013 01:44 AM, Greg Stark wrote:
On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
When we select a block we should read all rows on that block, to help
identify the extent of clustering within the data.
So how do you interpret the results of the sample read
Greg,
I really don't believe the 5% thing. It's not enough for n_distinct
and it's *far* too high a value for linear properties like histograms
or nullfrac etc.
Actually, it is enough for n_distinct, or more properly, 5% is as good
as you can get for n_distinct unless you're going to jump to
Josh Berkus j...@agliodbs.com writes:
Reading 5% of a 200GB table is going to be considerably faster than
reading the whole thing, if that 5% is being scanned in a way that the
FS understands.
Really? See the upthread point that reading one sector from each track
has just as much seek
On Mon, Dec 9, 2013 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
I really don't believe the 5% thing. It's not enough for n_distinct
and it's *far* too high a value for linear properties like histograms
or nullfrac etc.
Actually, it is enough for n_distinct, or more properly, 5% is as
On Mon, Dec 9, 2013 at 6:03 PM, Josh Berkus j...@agliodbs.com wrote:
It's also applicable for the other stats; histogram buckets constructed
from a 5% sample are more likely to be accurate than those constructed
from a 0.1% sample. Same with nullfrac. The degree of improved
accuracy,
On Mon, Dec 9, 2013 at 6:54 PM, Greg Stark st...@mit.edu wrote:
This some math is straightforward basic statistics. The 95th
percentile confidence interval for a sample consisting of 300 samples
from a population of a 1 million would be 5.66%. A sample consisting
of 1000 samples would have a
On Sat, Dec 7, 2013 at 11:46 AM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Dec 3, 2013 at 6:30 PM, Greg Stark st...@mit.edu wrote:
I always gave the party line that ANALYZE only takes a small
constant-sized sample so even very large tables should be very quick.
But after hearing the
On 12/6/13 3:21 AM, Andres Freund wrote:
On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help Greg, because his
complaint is mostly that a fresh ANALYZE is too expensive, but it
On 12/8/13 1:49 PM, Heikki Linnakangas wrote:
On 12/08/2013 08:14 PM, Greg Stark wrote:
The whole accounts table is 1.2GB and contains 10 million rows. As
expected with rows_per_block set to 1 it reads 240MB of that
containing nearly 2 million rows (and takes nearly 20s -- doing a full
table
On Mon, Dec 9, 2013 at 1:18 PM, Jeff Janes jeff.ja...@gmail.com wrote:
I don't recall ever tracing a bad plan down to a bad n_distinct.
It does happen. I've seen it several times.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to
On 12/09/2013 11:35 PM, Jim Nasby wrote:
On 12/8/13 1:49 PM, Heikki Linnakangas wrote:
On 12/08/2013 08:14 PM, Greg Stark wrote:
The whole accounts table is 1.2GB and contains 10 million rows. As
expected with rows_per_block set to 1 it reads 240MB of that
containing nearly 2 million rows (and
On Mon, Dec 9, 2013 at 6:47 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 12/09/2013 11:35 PM, Jim Nasby wrote:
On 12/8/13 1:49 PM, Heikki Linnakangas wrote:
On 12/08/2013 08:14 PM, Greg Stark wrote:
The whole accounts table is 1.2GB and contains 10 million rows. As
expected
On Mon, Dec 9, 2013 at 4:18 PM, Jeff Janes jeff.ja...@gmail.com wrote:
My reading of the code is that if it is not in the MCV, then it is assumed
to have the average selectivity (about 1/n_distinct, but deflating top and
bottom for the MCV list). There is also a check that it is less than the
On 12/09/2013 11:56 PM, Claudio Freire wrote:
On Mon, Dec 9, 2013 at 6:47 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 12/09/2013 11:35 PM, Jim Nasby wrote:
On 12/8/13 1:49 PM, Heikki Linnakangas wrote:
On 12/08/2013 08:14 PM, Greg Stark wrote:
The whole accounts table is
On Mon, Dec 9, 2013 at 8:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 12/09/2013 11:56 PM, Claudio Freire wrote:
Without patches to the kernel, it is much better.
posix_fadvise interferes with read-ahead, so posix_fadvise on, say,
bitmap heap scans (or similarly sorted analyze
Claudio Freire klaussfre...@gmail.com wrote:
On Mon, Dec 9, 2013 at 8:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to
be very
easy, patch attached. Your mileage may vary, but I'm seeing a nice
gain from
this on my
On Mon, Dec 9, 2013 at 8:45 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
Claudio Freire klaussfre...@gmail.com wrote:
On Mon, Dec 9, 2013 at 8:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to
be very
easy,
Heikki Linnakangas hlinnakan...@vmware.com writes:
Maybe. Or maybe the heuristic read ahead isn't significant/helpful, when
you're prefetching with posix_fadvise anyway.
Yeah. If we're not reading consecutive blocks, readahead is unlikely
to do anything anyhow.
Claudio's comments do suggest
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
very easy, patch attached. Your mileage may vary, but I'm seeing a
nice gain from this on my laptop. Taking a 3 page sample of a
table with 717717 pages (ie. slightly
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:
I did a test run:
pgbench scale 2000 (pgbench_accounts approx 25GB).
postgres 9.4
i7 3.5Ghz Cpu
16GB Ram
500 GB Velociraptor 10K
(cold os and pg cache both runs)
Without patch: ANALYZE pgbench_accounts90s
With patch: ANALYZE
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to
be very easy, patch attached. Your mileage may vary, but I'm seeing a
nice gain from this on my laptop. Taking a 3 page sample of a
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to
be very easy, patch attached. Your mileage may vary, but I'm seeing
a nice gain from this on my
On 12/06/2013 09:52 AM, Peter Geoghegan wrote:
Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help Greg, because his
complaint is mostly that a fresh ANALYZE is too expensive, but it
could be an interesting, albeit risky
On 12/10/2013 05:20 AM, Jim Nasby wrote:
FWIW, if synchronize_seqscans is on I'd think it'd be pretty easy to
fire up a 2nd backend to do the ANALYZE portion (or perhaps use Robert's
fancy new shared memory stuff).
Apologies for posting the same as a new idea before I saw your post. I
need
On 10/12/13 13:53, Mark Kirkwood wrote:
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out to
be very easy, patch attached. Your mileage may vary, but
On 10/12/13 15:04, Mark Kirkwood wrote:
On 10/12/13 13:53, Mark Kirkwood wrote:
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in ANALYZE. It turned out
to be very easy,
On 10/12/13 15:11, Mark Kirkwood wrote:
On 10/12/13 15:04, Mark Kirkwood wrote:
On 10/12/13 13:53, Mark Kirkwood wrote:
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I took a stab at using posix_fadvise() in
On 12/09/2013 02:37 PM, Robert Haas wrote:
I've never seen an n_distinct value of more than 5 digits, regardless
of reality. Typically I've seen 20-50k, even if the real number is
much higher. But the n_distinct value is only for non-MCVs, so if we
estimate the selectivity of column =
On 10/12/13 15:17, Mark Kirkwood wrote:
On 10/12/13 15:11, Mark Kirkwood wrote:
On 10/12/13 15:04, Mark Kirkwood wrote:
On 10/12/13 13:53, Mark Kirkwood wrote:
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On 10/12/13 12:14, Heikki Linnakangas wrote:
I
On 10/12/13 15:32, Mark Kirkwood wrote:
On 10/12/13 15:17, Mark Kirkwood wrote:
On 10/12/13 15:11, Mark Kirkwood wrote:
On 10/12/13 15:04, Mark Kirkwood wrote:
On 10/12/13 13:53, Mark Kirkwood wrote:
On 10/12/13 13:20, Mark Kirkwood wrote:
On 10/12/13 13:14, Mark Kirkwood wrote:
On
On Tue, Dec 10, 2013 at 12:13 AM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
Just one more...
The Intel 520 with ext4:
Without patch: ANALYZE pgbench_accounts 5s
With patch: ANALYZE pgbench_accounts 1s
And double checking -
With patch, but effective_io_concurrency = 1: ANALYZE
On 10/12/13 17:18, Claudio Freire wrote:
On Tue, Dec 10, 2013 at 12:13 AM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
Just one more...
The Intel 520 with ext4:
Without patch: ANALYZE pgbench_accounts 5s
With patch: ANALYZE pgbench_accounts 1s
And double checking -
With patch, but
On Sun, Dec 8, 2013 at 12:06 AM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
bench=# ANALYZE pgbench_accounts;
NOTICE: acquire sample will need 3 blocks
NOTICE: sampled 3 blocks
ANALYZE
Time: 10059.446 ms
bench=# \q
I did some experimenting here as well.
I hacked up a
On 12/08/2013 10:14 AM, Greg Stark wrote:
With rows_per_block=1 the MCV frequency list ranges from .0082 to .0123
With rows_per_block=4 the MCV frequency list ranges from .0063 to .0125
With rows_per_block=16 the MCV frequency list ranges from .0058 to .0164
With rows_per_block=64 the MCV
1 - 100 of 117 matches
Mail list logo