Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-06 Thread Merlin Moncure
On Wed, Oct 5, 2016 at 3:27 PM, Stephen Frost  wrote:
> Darren,
>
> * Darren Lafreniere (dlafreni...@onezero.com) wrote:
>> Tom Lane  wrote:
>> > > Gavin Wahl wrote:
>> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> > >> just find the page range with the largest/smallest value, and then only
>> > >> scan that one. Would that be hard to implement? I'm interested in
>> > working
>> > >> on it if someone can give me some pointers.
>> >
>> > I think this proposal is fairly broken anyway.  The page range with the
>> > largest max-value may once have contained the largest live row, but
>> > there's no guarantee that it still does.  It might even be completely
>> > empty.  You could imagine an algorithm like this:
>> >
>> > 1. Find page-range with largest max.  Scan it to identify live row with
>> > largest value.  If *no* live values, find page-range with next largest
>> > max, repeat until no page ranges remain (whereupon return NULL).
>> >
>> > 2. For each remaining page-range whose indexed max exceeds the value
>> > currently in hand, scan that page-range to see if any value exceeds
>> > the one in hand, replacing the value if so.
>> >
>> > This'd probably allow you to omit scanning some of the page-ranges
>> > in the table, but in a lot of cases you'd end up scanning many of them;
>> > and you'd need a lot of working state to remember which ranges you'd
>> > already looked at.  It'd certainly always be a lot more expensive than
>> > answering the same question with a btree index, because in no case do
>> > you get to avoid scanning the entire contents of the index.
> [...]
>> A b-tree index would certainly be faster for ordering. But in scenarios
>> where you have huge datasets that can't afford the space or update time
>> required for b-tree, could such a BRIN-accelerated ordering algorithm at
>> least be faster than ordering with no index?
>
> For at least some of the common BRIN use-cases, where the rows are
> inserted in-order and never/very-rarely modified or deleted, this
> approach would work very well.
>
> Certainly, using this would be much cheaper than a seqscan/top-N sort,
> for small values of 'N', relative to the number of rows in the table,
> in those cases.
>
> In general, I like the idea of supporting this as BRIN indexes strike me
> as very good for very large tables which have highly clumped data in
> them and being able to do a top-N query on those can be very useful at
> times.

Yeah.  If the brin average page overlap and % dead tuple coefficients
are low it absolutely makes sense to drive top N with brin.  It will
never beat a btree but typically brin is used when the btree index is
no good for various reasons.

brin indexes are pretty neat; they can provide stupefying amounts of
optimization in many common warehousing workloads.   They even beat
out index only scans for a tiny fraction of the storage.  Of course,
you have to work around the limitations... :-)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Stephen Frost  wrote:

> For at least some of the common BRIN use-cases, where the rows are
> inserted in-order and never/very-rarely modified or deleted, this
> approach would work very well.
>

Thanks Stephen, this is exactly our use case.


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Stephen Frost
Darren,

* Darren Lafreniere (dlafreni...@onezero.com) wrote:
> Tom Lane  wrote:
> > > Gavin Wahl wrote:
> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> > >> just find the page range with the largest/smallest value, and then only
> > >> scan that one. Would that be hard to implement? I'm interested in
> > working
> > >> on it if someone can give me some pointers.
> >
> > I think this proposal is fairly broken anyway.  The page range with the
> > largest max-value may once have contained the largest live row, but
> > there's no guarantee that it still does.  It might even be completely
> > empty.  You could imagine an algorithm like this:
> >
> > 1. Find page-range with largest max.  Scan it to identify live row with
> > largest value.  If *no* live values, find page-range with next largest
> > max, repeat until no page ranges remain (whereupon return NULL).
> >
> > 2. For each remaining page-range whose indexed max exceeds the value
> > currently in hand, scan that page-range to see if any value exceeds
> > the one in hand, replacing the value if so.
> >
> > This'd probably allow you to omit scanning some of the page-ranges
> > in the table, but in a lot of cases you'd end up scanning many of them;
> > and you'd need a lot of working state to remember which ranges you'd
> > already looked at.  It'd certainly always be a lot more expensive than
> > answering the same question with a btree index, because in no case do
> > you get to avoid scanning the entire contents of the index.
[...]
> A b-tree index would certainly be faster for ordering. But in scenarios
> where you have huge datasets that can't afford the space or update time
> required for b-tree, could such a BRIN-accelerated ordering algorithm at
> least be faster than ordering with no index?

For at least some of the common BRIN use-cases, where the rows are
inserted in-order and never/very-rarely modified or deleted, this
approach would work very well.

Certainly, using this would be much cheaper than a seqscan/top-N sort,
for small values of 'N', relative to the number of rows in the table,
in those cases.

In general, I like the idea of supporting this as BRIN indexes strike me
as very good for very large tables which have highly clumped data in
them and being able to do a top-N query on those can be very useful at
times.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Tom Lane  wrote:

> > Gavin Wahl wrote:
> >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> >> just find the page range with the largest/smallest value, and then only
> >> scan that one. Would that be hard to implement? I'm interested in
> working
> >> on it if someone can give me some pointers.
>
> I think this proposal is fairly broken anyway.  The page range with the
> largest max-value may once have contained the largest live row, but
> there's no guarantee that it still does.  It might even be completely
> empty.  You could imagine an algorithm like this:
>
> 1. Find page-range with largest max.  Scan it to identify live row with
> largest value.  If *no* live values, find page-range with next largest
> max, repeat until no page ranges remain (whereupon return NULL).
>
> 2. For each remaining page-range whose indexed max exceeds the value
> currently in hand, scan that page-range to see if any value exceeds
> the one in hand, replacing the value if so.
>
> This'd probably allow you to omit scanning some of the page-ranges
> in the table, but in a lot of cases you'd end up scanning many of them;
> and you'd need a lot of working state to remember which ranges you'd
> already looked at.  It'd certainly always be a lot more expensive than
> answering the same question with a btree index, because in no case do
> you get to avoid scanning the entire contents of the index.
>
> regards, tom lane
>


Thanks Tom,

A b-tree index would certainly be faster for ordering. But in scenarios
where you have huge datasets that can't afford the space or update time
required for b-tree, could such a BRIN-accelerated ordering algorithm at
least be faster than ordering with no index?

Darren Lafreniere


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Tom Lane
Alvaro Herrera  writes:
> Darren Lafreniere wrote:
>> We found a pgsql-hackers thread from about a year ago about optimizing
>> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
>> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us

> Tom said he was working on some infrastructure planner changes
> ("upper-planner path-ification"), not that he was working on improving
> usage of BRIN indexes.  As far as I know, nobody has worked on that.

Alvaro's reading is correct; I wasn't planning to work on any such thing,
and still am not.

Looking again at the original thread:

> Gavin Wahl wrote:
>> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> just find the page range with the largest/smallest value, and then only
>> scan that one. Would that be hard to implement? I'm interested in working
>> on it if someone can give me some pointers.

I think this proposal is fairly broken anyway.  The page range with the
largest max-value may once have contained the largest live row, but
there's no guarantee that it still does.  It might even be completely
empty.  You could imagine an algorithm like this:

1. Find page-range with largest max.  Scan it to identify live row with
largest value.  If *no* live values, find page-range with next largest
max, repeat until no page ranges remain (whereupon return NULL).

2. For each remaining page-range whose indexed max exceeds the value
currently in hand, scan that page-range to see if any value exceeds
the one in hand, replacing the value if so.

This'd probably allow you to omit scanning some of the page-ranges
in the table, but in a lot of cases you'd end up scanning many of them;
and you'd need a lot of working state to remember which ranges you'd
already looked at.  It'd certainly always be a lot more expensive than
answering the same question with a btree index, because in no case do
you get to avoid scanning the entire contents of the index.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Ahh, yes. I misread that. Thank you for the clarification.

On Wed, Oct 5, 2016 at 2:27 PM, Alvaro Herrera 
wrote:

> Darren Lafreniere wrote:
>
> > "In addition to simply finding the rows to be returned by a query, an
> index
> > may be able to deliver them in a specific sorted order. This allows a
> > query's ORDER BY specification to be honored without a separate sorting
> > step. Of the index types currently supported by PostgreSQL, only B-tree
> can
> > produce sorted output — the other index types return matching rows in an
> > unspecified, implementation-dependent order."
> >
> > We found a pgsql-hackers thread from about a year ago about optimizing
> > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
> > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us
>
> Tom said he was working on some infrastructure planner changes
> ("upper-planner path-ification"), not that he was working on improving
> usage of BRIN indexes.  As far as I know, nobody has worked on that.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Alvaro Herrera
Darren Lafreniere wrote:

> "In addition to simply finding the rows to be returned by a query, an index
> may be able to deliver them in a specific sorted order. This allows a
> query's ORDER BY specification to be honored without a separate sorting
> step. Of the index types currently supported by PostgreSQL, only B-tree can
> produce sorted output — the other index types return matching rows in an
> unspecified, implementation-dependent order."
> 
> We found a pgsql-hackers thread from about a year ago about optimizing
> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us

Tom said he was working on some infrastructure planner changes
("upper-planner path-ification"), not that he was working on improving
usage of BRIN indexes.  As far as I know, nobody has worked on that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Hello,

We're curious about the current behavior in 9.5.4, and possible future
enhancements, of BRIN indexes with respect to ordering.

In the docs, section 11.4. "Indexes and ORDER BY" (
https://www.postgresql.org/docs/9.5/static/indexes-ordering.html) is clear
that anything other than B-tree indexes have unspecified ordering:

"In addition to simply finding the rows to be returned by a query, an index
may be able to deliver them in a specific sorted order. This allows a
query's ORDER BY specification to be honored without a separate sorting
step. Of the index types currently supported by PostgreSQL, only B-tree can
produce sorted output — the other index types return matching rows in an
unspecified, implementation-dependent order."

We found a pgsql-hackers thread from about a year ago about optimizing
ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us


Our current test shows that ordering by a BRIN indexed column still
performs an unoptimized sort:

SELECT generate_series(1, 1000) AS id INTO test;
CREATE INDEX idx_test_id ON test USING BRIN (id);
EXPLAIN SELECT id FROM test ORDER BY id DESC LIMIT 20;

Limit  (cost=410344.40..410344.45 rows=20 width=4)
  ->  Sort  (cost=410344.40..435344.40 rows=100 width=4)"
Sort Key: id DESC
->  Seq Scan on test  (cost=0.00..144248.00 rows=1000 width=4)

Is there anything we're missing to speed this up? Or is it still a future
feature?

Thank you,
Darren Lafreniere