Re: [GENERAL] speeding up big query lookup

2006-08-29 Thread macgillivary
I just finished reading 'The Art of SQL' by Stephane Faroult who has a
chapter (ch 6) discussing this very topic.  I strongly recommend any
developer dealing with databases take a few days to read this
narrative.

A solution would seem to depend on whether you have many objects which
change in measurement only occasionally or if your have very few
objects whose measurement change very frequently.  If you have a
chance, go to your local big name bookstore who allow you to sit and
take a read of the book, jump to ch6 (page 156 or so) and get a good
understanding of some various techniques to take into consideration.

A good suggestion (imo) is the composite key already mentioned.  If I
can give an example from the book in the case you have many objects
whose measurements change occasionally:

select whatever
from object_val as outer
where outer.item_id = someValueForObjectX
and object_val_type_id = someValueForTypeA
and outer.observation_date = ( select max(inner.observation_date)
 from object_val as inner
 where inner.item_id =
someValueForObjectX
  and
inner.object_val_type_id = someValueForTypeA
  and
inner.observation_date = yourReferenceDate )

Hopefully, I haven't completely mis-understood the author's intentions
and gave you some silly query.  Some may argue that the inner.item_id
could be correlated to the outer.item_id, but then the inner query
would need to be run multiple times.  If used as presented, the inner
query fires only once.

I'd be rather interested in knowing if the above query (or similar)
works in practice as occassionally can sound good on paper, but doesn't
work in the real world - I'd like to know if in your situation it
provides a good execution time.  I would take the step of creating that
composite key as suggested by Ragnar first.

Good luck,
am.

Silvela, Jaime (Exchange) wrote:
 This is a question on speeding up some type of queries.

 I have a very big table that catalogs measurements of some objects over
 time. Measurements can be of several (~10) types. It keeps the
 observation date in a field, and indicates the type of measurement in
 another field.

 I often need to get the latest measurement of type A for object X.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] speeding up big query lookup

2006-08-29 Thread macgillivary
Just for fun, another approach since I believe pg supports it:

select whateverFields
from object_val as outer
where (outer.object_id,
outer.object_val_type_id,outer.observation_date) IN
 (select inner.object_id,
inner.object_val_type,max(inner.observation_date)
  from object_val as inner
  where inner.object_id = somevalueForObjectX
  and inner.object_val_type = someValueForTypeA
  and inner.observation_date = yourReferenceDate
  group by inner.object_id, inner.object_val_type)

The reason these subqueries should run quickly is because the
object_id,object_val_type,oberservation_date make up a composite key,
so the subquery should execute extremely fast, thus eliminating the
majority of the data when you want to display or act on other fields
from the object_val (as outer).  I suppose if you don't need any
further information from object_val, and you are happy with the speeds,
Tom's method is smooth.

Adding the order by clause will take you out of the 'relational world'
and thus slow you down.  My fear with the triggers and the separate
snapshot is that the delays are spread out and add questionable
complexity, and potentially uneccessary overhead to the application.
Something to consider (although admittedly it is arguably a weak
consideration in some circumstances) is the extra space, indexes, and
other factors such as additional time for backup routines (and
restoration) the extra table creates.

Best of luck,
am


Silvela, Jaime (Exchange) wrote:
 No, you can make this work just fine if you JOIN right.
 You're way is a more concise way of expressing it, though.

 Tom's trick

   SELECT DISTINCT ON (object_id, object_val_type_id) * from object_val
 ORDER BY object_id DESC, object_val_type_id DESC, observation_date
 DESC

 Runs about twice as fast as the GROUP BY ... HAVING, but definitely not
 as fast as keeping a separate table with only the latest observations,
 updated by triggers. I'll be testing out the differences in overall
 performance for my applications.
 
 Thanks for the suggestions,
 Jaime



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] speeding up big query lookup

2006-08-29 Thread Geoffrey

macgillivary wrote:

I just finished reading 'The Art of SQL' by Stephane Faroult who has a
chapter (ch 6) discussing this very topic.


I'd be curious to know any other references, books, folks would 
recommend when it comes to writing efficient SQL, as well as references 
on database design.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] speeding up big query lookup

2006-08-29 Thread Silvela, Jaime \(Exchange\)
Thanks for the tips, am

Actually, your suggestion is equivalent to JOINing the table with a
GROUP BY copy of itself, and EXPLAIN shows both versions using the same
index and aggregates. Just a matter of style. Your previous suggestion
from the book works well too, but I actually prefer the JOIN method,
since that allows me to set the object_id and/or object_val_type values
in only one place.

Tom's method is faster, but has against it a bit of obscurity - it's
very fine tuned to a very specific behavior of DISTINCT ON and is less
easy to read than the others.

I fully agree that it is annoying to keep another table with triggers.
And of course, that table needs to be indexed too, or it's worthless.
I'm wondering how much extra time the db spends running all those
indexes and triggers, and I'll probably dismantle that in favor of the
composite index and the queries suggested so far.

I'll definitely check that book, I've been looking for something like
that.

Thanks
Jaime


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of macgillivary
Sent: Monday, August 28, 2006 10:14 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] speeding up big query lookup

Just for fun, another approach since I believe pg supports it:

select whateverFields
from object_val as outer
where (outer.object_id,
outer.object_val_type_id,outer.observation_date) IN
 (select inner.object_id,
inner.object_val_type,max(inner.observation_date)
  from object_val as inner
  where inner.object_id = somevalueForObjectX
  and inner.object_val_type = someValueForTypeA
  and inner.observation_date = yourReferenceDate
  group by inner.object_id, inner.object_val_type)

The reason these subqueries should run quickly is because the
object_id,object_val_type,oberservation_date make up a composite key,
so the subquery should execute extremely fast, thus eliminating the
majority of the data when you want to display or act on other fields
from the object_val (as outer).  I suppose if you don't need any
further information from object_val, and you are happy with the speeds,
Tom's method is smooth.

Adding the order by clause will take you out of the 'relational world'
and thus slow you down.  My fear with the triggers and the separate
snapshot is that the delays are spread out and add questionable
complexity, and potentially uneccessary overhead to the application.
Something to consider (although admittedly it is arguably a weak
consideration in some circumstances) is the extra space, indexes, and
other factors such as additional time for backup routines (and
restoration) the extra table creates.

Best of luck,
am


Silvela, Jaime (Exchange) wrote:
 No, you can make this work just fine if you JOIN right.
 You're way is a more concise way of expressing it, though.

 Tom's trick

   SELECT DISTINCT ON (object_id, object_val_type_id) * from object_val
 ORDER BY object_id DESC, object_val_type_id DESC, observation_date
 DESC

 Runs about twice as fast as the GROUP BY ... HAVING, but definitely
not
 as fast as keeping a separate table with only the latest observations,
 updated by triggers. I'll be testing out the differences in overall
 performance for my applications.
 
 Thanks for the suggestions,
 Jaime



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] speeding up big query lookup

2006-08-28 Thread Alban Hertroys

Silvela, Jaime (Exchange) wrote:

The obvoious way to get the latest measurement of type A would be to
join the table against

SELECT object_id, object_val_type_id, max(observation_date)
FROM object_val
GROUP BY object_id, object_val_type_id


I'm not sure this is actually the result you want; doesn't this give you 
all the unique (object_id, object_val_type_id)'s combined with the max 
observation_date in the table (as in, not necessarily related to the 
records listed)?


I'd think you want this:
 SELECT object_id, object_val_type_id, observation_date
 FROM object_val
 GROUP BY object_id, object_val_type_id, observation_date
 HAVING observation_date = max(observation_date)

Which'd return a single record with the highest observation_date. Though 
not strictly necessary, I can imagine you'd want observation_date to be 
unique, or you could get grouped observations with the same date.


Although ordering and limiting is probably faster.
I don't think the planner is intelligent enough to know that this would 
only return the record with the highest observation_date - it may be 
smart enough to reject (drop from the result set) found records after 
finding ones with a higher observation_date (which'd be interesting 
when using cursors) or something along those lines. Hmm... Now I'm all 
curious; an EXPLAIN'd be interesting...


Sorry for the mostly useless post :P

Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] speeding up big query lookup

2006-08-28 Thread Silvela, Jaime \(Exchange\)
No, you can make this work just fine if you JOIN right.
You're way is a more concise way of expressing it, though.

Tom's trick

  SELECT DISTINCT ON (object_id, object_val_type_id) * from object_val
ORDER BY object_id DESC, object_val_type_id DESC, observation_date
DESC

Runs about twice as fast as the GROUP BY ... HAVING, but definitely not
as fast as keeping a separate table with only the latest observations,
updated by triggers. I'll be testing out the differences in overall
performance for my applications.

Thanks for the suggestions,
Jaime



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alban Hertroys
Sent: Monday, August 28, 2006 4:57 AM
To: Silvela, Jaime (Exchange)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] speeding up big query lookup

Silvela, Jaime (Exchange) wrote:
 The obvoious way to get the latest measurement of type A would be to
 join the table against
 
 SELECT object_id, object_val_type_id, max(observation_date)
 FROM object_val
 GROUP BY object_id, object_val_type_id

I'm not sure this is actually the result you want; doesn't this give you

all the unique (object_id, object_val_type_id)'s combined with the max 
observation_date in the table (as in, not necessarily related to the 
records listed)?

I'd think you want this:
  SELECT object_id, object_val_type_id, observation_date
  FROM object_val
  GROUP BY object_id, object_val_type_id, observation_date
  HAVING observation_date = max(observation_date)

Which'd return a single record with the highest observation_date. Though

not strictly necessary, I can imagine you'd want observation_date to be 
unique, or you could get grouped observations with the same date.

Although ordering and limiting is probably faster.
I don't think the planner is intelligent enough to know that this would 
only return the record with the highest observation_date - it may be 
smart enough to reject (drop from the result set) found records after 
finding ones with a higher observation_date (which'd be interesting 
when using cursors) or something along those lines. Hmm... Now I'm all 
curious; an EXPLAIN'd be interesting...

Sorry for the mostly useless post :P

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] speeding up big query lookup

2006-08-27 Thread Tom Lane
Silvela, Jaime \(Exchange\) [EMAIL PROTECTED] writes:
 I have a very big table that catalogs measurements of some objects over
 time. Measurements can be of several (~10) types. It keeps the
 observation date in a field, and indicates the type of measurement in
 another field.

 I often need to get the latest measurement of type A for object X.

This is a pretty common requirement, and since plain SQL doesn't handle
it very well, different DBMSes have invented different extensions to
help.  For instance you can use LIMIT:

  SELECT * from object_val WHERE object_id = X and object_val_type_id = Y
ORDER BY object_id DESC, object_val_type_id DESC, observation_date DESC
LIMIT 1;

This will work very very fast if there is an index on (object_id,
object_val_type_id, observation_date) for it to use.  The only problem
with it is that there's no obvious way to extend it to fetch latest
measurements for several objects in one query.

Another way, which AFAIK is Postgres-only, is to use DISTINCT ON:

  SELECT DISTINCT ON (object_id, object_val_type_id) * from object_val
ORDER BY object_id DESC, object_val_type_id DESC, observation_date DESC

This can give you all the latest measurements at once, or any subset
you need (just add a WHERE clause).  It's not amazingly fast but it
generally beats the bog-standard-SQL alternatives, which as you
mentioned require joining against subselects.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] speeding up big query lookup

2006-08-26 Thread Ragnar
On fös, 2006-08-25 at 18:34 -0400, Silvela, Jaime (Exchange) wrote:
 This is a question on speeding up some type of queries.
 
 I have a very big table that catalogs measurements of some objects over
 time. Measurements can be of several (~10) types. It keeps the
 observation date in a field, and indicates the type of measurement in
 another field.
 
 I often need to get the latest measurement of type A for object X.
 The table is indexed by object_id.

one popular way is to create a composite index:
CREATE INDEX object_val_id_type_date
   ON object_val(object_id,
 object_val_type_id,
 observation_date);

then you could

SELECT * FROM object_val
 WHERE object_id=?
   AND object_val_type_id=?
 ORDER BY observation_date DESC
 LIMIT 1;


Hope this helps
gnari




---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] speeding up big query lookup

2006-08-25 Thread Silvela, Jaime \(Exchange\)
This is a question on speeding up some type of queries.

I have a very big table that catalogs measurements of some objects over
time. Measurements can be of several (~10) types. It keeps the
observation date in a field, and indicates the type of measurement in
another field.

I often need to get the latest measurement of type A for object X.

The table is indexed by object_id.

The obvoious way to get the latest measurement of type A would be to
join the table against

SELECT object_id, object_val_type_id, max(observation_date)
FROM object_val
GROUP BY object_id, object_val_type_id

But this can take a long time, and needs to be done very often.

Next strategy was to build an auxiliary table that just keeps the last
measurement of each type, for each object. I defined triggers to keep
this table up to date whenever the object_val table was updated.

However I don't trust this as much, and though it helps performance in
reads, I'm not sure it's the best option overall.

I found that Postgres can use indexes on pairs, so I'm going to play
with indexing object_val by (object_id, object_val_type_id), and I'm
sure my original JOIN query will be much faster.

But I'm sure many people have this type of problem. Is there a smarter
way to deal with this? I'd love to be able to choose the row with max
value of some of the fields with just one statement, but I think this
can't be done?

Thanks
Jaime



***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq