Re: same query, two different approaches, vastly different performance

2012-02-17 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 query 2 is at least an order of magnitude slower than query 1

To follow up on this, the current best practice for handling this 
is to prepare two statements, and have your app use the correct one. 
One could be prepared with pg_server_prepare=0, and one as normal.

That's a general setup that may not apply to your exact situation, 
but we can't tell what your actual arguments are and how often you 
call them based on your example. The idea is to create different 
statements (perhaps even more than two) to account for the different 
plans that are created, based on the information available to the 
planner. Your app can then pick the appropriate one to use based on 
what parameters are being sent.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201202171158
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk8+iN4ACgkQvJuQZxSWSsjZ0ACdG+U7Ux7mdNs8NQ2W/8J/HYPA
i1kAn0FSO5ukuG6mvbIuboq2iRvAlg2e
=S3U1
-END PGP SIGNATURE-




Re: same query, two different approaches, vastly different performance

2012-02-17 Thread Puneet Kishor

On Feb 17, 2012, at 11:05 AM, Greg Sabino Mullane wrote:

 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: RIPEMD160
 
 
 query 2 is at least an order of magnitude slower than query 1
 
 To follow up on this, the current best practice for handling this 
 is to prepare two statements, and have your app use the correct one. 
 One could be prepared with pg_server_prepare=0, and one as normal.


Now that I know the reason behind this, thanks to all of you, I have decided to 
stick with inline params `LIKE $q` instead of bind values. I am not too worried 
about SQL attacks, and the above strategy works well without having to toggle 
`pg_server_prepare`. Also, I do expect the query to run often, but as the 
frontend is a web app, it will be called in separate sessions... so, think many 
reloads of the same page as opposed to one page request firing many instances 
of the same query.


Many thanks Greg.


 
 That's a general setup that may not apply to your exact situation, 
 but we can't tell what your actual arguments are and how often you 
 call them based on your example. The idea is to create different 
 statements (perhaps even more than two) to account for the different 
 plans that are created, based on the information available to the 
 planner. Your app can then pick the appropriate one to use based on 
 what parameters are being sent.
 
 - -- 
 Greg Sabino Mullane g...@turnstep.com
 End Point Corporation http://www.endpoint.com/
 PGP Key: 0x14964AC8 201202171158
 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
 -BEGIN PGP SIGNATURE-
 
 iEYEAREDAAYFAk8+iN4ACgkQvJuQZxSWSsjZ0ACdG+U7Ux7mdNs8NQ2W/8J/HYPA
 i1kAn0FSO5ukuG6mvbIuboq2iRvAlg2e
 =S3U1
 -END PGP SIGNATURE-
 
 



Re: same query, two different approaches, vastly different performance

2012-02-17 Thread Bill Ward
On Fri, Feb 17, 2012 at 9:18 AM, Puneet Kishor punk.k...@gmail.com wrote:


 On Feb 17, 2012, at 11:05 AM, Greg Sabino Mullane wrote:

 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: RIPEMD160
 
 
  query 2 is at least an order of magnitude slower than query 1
 
  To follow up on this, the current best practice for handling this
  is to prepare two statements, and have your app use the correct one.
  One could be prepared with pg_server_prepare=0, and one as normal.


 Now that I know the reason behind this, thanks to all of you, I have
 decided to stick with inline params `LIKE $q` instead of bind values. I am
 not too worried about SQL attacks, and the above strategy works well
 without having to toggle `pg_server_prepare`. Also, I do expect the query
 to run often, but as the frontend is a web app, it will be called in
 separate sessions... so, think many reloads of the same page as opposed to
 one page request firing many instances of the same query.


You can be reasonably safe using inline params like that if you're careful
to make sure $q contains only the sort of characters that make sense for
your app, and/or if $q is quoted properly. You can use a regex to test that
it is only alphanumeric for example. It's just easier to guarantee safety
if you use bind values... but that's not always practical as you have
discovered.

-- 
Check out my LEGO blog at http://www.brickpile.com
Follow/friend me: facebook.com/billward • flickr.com/photos/billward •
twitter.com/williamward


Re: same query, two different approaches, vastly different performance

2012-02-17 Thread Chad Wallace
On Fri, 17 Feb 2012 11:18:39 -0600
Puneet Kishor punk.k...@gmail.com wrote:

  query 2 is at least an order of magnitude slower than query 1
 
 Now that I know the reason behind this, thanks to all of you, I have
 decided to stick with inline params `LIKE $q` instead of bind values.
 I am not too worried about SQL attacks, and the above strategy works
 well without having to toggle `pg_server_prepare`. Also, I do expect
 the query to run often, but as the frontend is a web app, it will be
 called in separate sessions... so, think many reloads of the same
 page as opposed to one page request firing many instances of the same
 query.

Unless you're cleaning up the input in some other way, I would suggest
you use $dbh-quote() on $q before interpolating it into your query,
like so:

my $quoted = $dbh-quote( $q );
$dbh-do( ... LIKE $quoted );

It's a Good Idea--regardless of whether you're worried. :-)


-- 

C. Chad Wallace, B.Sc.
The Lodging Company
http://www.lodgingcompany.com/
OpenPGP Public Key ID: 0x262208A0



Re: same query, two different approaches, vastly different performance

2012-02-17 Thread Douglas Wilson
On Feb 17, 9:59 am, b...@wards.net (Bill Ward) wrote:

 You can be reasonably safe using inline params like that if you're careful
 to make sure $q contains only the sort of characters that make sense for
 your app, and/or if $q is quoted properly. You can use a regex to test that
 it is only alphanumeric for example. It's just easier to guarantee safety
 if you use bind values... but that's not always practical as you have
 discovered.


Don't use a regex. Use the -quote() method. That's what it's for.



Re: same query, two different approaches, vastly different performance

2012-02-16 Thread Douglas Wilson
On Feb 13, 4:46 pm, punk.k...@gmail.com (Puneet Kishor) wrote:
 I asked this on Stackoverflow and on Perlmonks, but hopefully I will get a 
 more satisfactory and revealing insight straight from the DBI folks.

I don't know why you need to ask this in three different forums. As
you've been told now more than once, it has to do with preparing a
query plan without know yet what the arguments are.



same query, two different approaches, vastly different performance

2012-02-13 Thread Puneet Kishor
I asked this on Stackoverflow and on Perlmonks, but hopefully I will get a more 
satisfactory and revealing insight straight from the DBI folks.

I have a Postgres table with more than 8 million rows. Given the following two 
ways of doing the same query, I get wildly different results.

$q .= '%';

## query 1
my $sql = qq{
SELECT a, b, c
FROM t 
WHERE Lower( a ) LIKE '$q'
};
my $sth1 = $dbh-prepare($sql);
$sth1-execute();

## query 2
my $sth2 = $dbh-prepare(qq{
SELECT a, b, c
FROM t  
WHERE Lower( a ) LIKE ?
});
$sth2-execute($q);


query 2 is at least an order of magnitude slower than query 1... seems like it 
is not using the indexes, while query 1 is using the index. And, yes, I have 
tried it with query 2 before query 1 to eliminate caching. There is a 
difference, but I don't know why, so would love hear an explanation.

Many tia,


--
Puneet Kishor

Re: same query, two different approaches, vastly different performance

2012-02-13 Thread Puneet Kishor

On Feb 13, 2012, at 6:57 PM, David Nicol wrote:

 did you find
 
 http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations
 
 in your research?


Thanks for that link, but I don't think that applies in my case. I have two 
exact queries, with a postfix wildcard search. One, I prepare with the search 
term embedded in the query. The other is prepared with a bind variable and the 
search term is added during $sth-execute.

The first one uses the index. The second does a full-table scan.

Strange, but true.



--
Puneet Kishor

Re: same query, two different approaches, vastly different performance

2012-02-13 Thread Puneet Kishor

On Feb 13, 2012, at 7:12 PM, Mike Blackwell wrote:

 Check the postgres-performance list archives.  I believe the difference is 
 due to the query planner having less info available with parameters vs inline 
 args.  
 



a... that makes sense. I would like to confirm this, because, if true, then 
it is a strike against statements prepared with bind values.


 
 
 On Feb 13, 2012, at 19:07, Puneet Kishor punk.k...@gmail.com wrote:
 
 
 On Feb 13, 2012, at 6:57 PM, David Nicol wrote:
 
 did you find
 
 http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations
 
 in your research?
 
 
 Thanks for that link, but I don't think that applies in my case. I have two 
 exact queries, with a postfix wildcard search. One, I prepare with the search 
 term embedded in the query. The other is prepared with a bind variable and 
 the search term is added during $sth-execute.
 
 The first one uses the index. The second does a full-table scan.
 
 Strange, but true.
 
 
 
 --
 Puneet Kishor



Re: same query, two different approaches, vastly different performance

2012-02-13 Thread Puneet Kishor

On Feb 13, 2012, at 7:59 PM, Mike Blackwell wrote:

 Yes. On the other hand, parameters are inherently safe from injection attacks 
 and other data issues, so which to use depends on the situation. 


Yup, I am aware of this. In this case, definitely, there is a difference of a 
couple of hundred ms for a query vs. a couple of three seconds for the same 
query. So, I am going to stick with the inline params.


 
 I'm guessing you already know about EXPLAIN.  There's an auto-explain 
 extension in the contrib folder that might help you see what's going on as 
 well.  
 

Yeah, I will dig into it to see if I can figure out something. However, I am 
not DBI-savvy enough to understand it all.

Thankfully, after much finagling, trials, errors, and various different kinds 
of indexes, I am now getting the results in usable amount of time.



 
 
 On Feb 13, 2012, at 19:16, Puneet Kishor punk.k...@gmail.com wrote:
 
 
 On Feb 13, 2012, at 7:12 PM, Mike Blackwell wrote:
 
 Check the postgres-performance list archives.  I believe the difference is 
 due to the query planner having less info available with parameters vs 
 inline args.  
 
 
 
 
 a... that makes sense. I would like to confirm this, because, if true, 
 then it is a strike against statements prepared with bind values.
 
 
 
 
 On Feb 13, 2012, at 19:07, Puneet Kishor punk.k...@gmail.com wrote:
 
 
 On Feb 13, 2012, at 6:57 PM, David Nicol wrote:
 
 did you find
 
 http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations
 
 in your research?
 
 
 Thanks for that link, but I don't think that applies in my case. I have two 
 exact queries, with a postfix wildcard search. One, I prepare with the 
 search term embedded in the query. The other is prepared with a bind 
 variable and the search term is added during $sth-execute.
 
 The first one uses the index. The second does a full-table scan.
 
 Strange, but true.
 
 
 
 --
 Puneet Kishor
 



Re: same query, two different approaches, vastly different performance

2012-02-13 Thread David Nicol
On Mon, Feb 13, 2012 at 7:16 PM, Puneet Kishor



 a... that makes sense. I would like to confirm this, because, if true,
 then it is a strike against statements prepared with bind values.


the postgres reference about when LIKE statements get to use indices states
that they are only available for /constants/ ending with %. It seems like
this would be simple enough to fix, if you want to be a hero and get a
patch accepted into Postgres.

It's in the fourth paragraph at
http://www.postgresql.org/docs/8.4/static/indexes-types.html


Also there is interesting stuff about creating indices for functions, so
you can index on lower(a) and then queries using lower(a) will get
optimized.
After reading that bit of the postgres docs I wonder if doing something like

lower(a) like ('^' || ?)

would allow the index to be used. Probably not, as the result would not be
a constant, even though it would be anchored to the beginning.


Re: same query, two different approaches, vastly different performance

2012-02-13 Thread Puneet Kishor
ok. I am completely lost here.


On Feb 13, 2012, at 8:10 PM, David Nicol wrote:

 On Mon, Feb 13, 2012 at 7:16 PM, Puneet Kishor
 
 
 
 a... that makes sense. I would like to confirm this, because, if true,
 then it is a strike against statements prepared with bind values.
 
 
 the postgres reference about when LIKE statements get to use indices states
 that they are only available for /constants/ ending with %. It seems like
 this would be simple enough to fix, if you want to be a hero and get a
 patch accepted into Postgres.
 
 It's in the fourth paragraph at
 http://www.postgresql.org/docs/8.4/static/indexes-types.html
 


why should a patch be required in Postgres? Seems to extremely naive pov that 
this is an issue with DBI.


 
 Also there is interesting stuff about creating indices for functions, so
 you can index on lower(a) and then queries using lower(a) will get
 optimized.
 After reading that bit of the postgres docs I wonder if doing something like
 
lower(a) like ('^' || ?)
 
 would allow the index to be used. Probably not, as the result would not be
 a constant, even though it would be anchored to the beginning.


So, the index I have created is as follows

CREATE INDEX idxa ON t (Lower(a) varchar_pattern_ops);

This is a web app (although I have done tests using standalone, non-web, 
scripts as well). The user types a few characters that are received in $q. I 
then do

$q = lc($q) . '%';

followed by

my $sql = qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE '$q'};
my $sth = $dbh-prepare($sql);
$sth-execute();

The above takes, say, 200-400 ms depending on the term. Or, in the case of the 
query that is really slow

my $sth = $dbh-prepare(qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE 
?});
$sth-execute($q);

The above takes 3 to 4 seconds. Pretty much identical queries with identical 
results by vastly difference performance.

Isn't this a DBI issue instead of a Postgres issue?

--
Puneet Kishor

Re: same query, two different approaches, vastly different performance

2012-02-13 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 my $sth = $dbh-prepare(qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE ?});
 $sth-execute($q);

 The above takes 3 to 4 seconds. Pretty much identical queries 
 with identical results by vastly difference performance.

 Isn't this a DBI issue instead of a Postgres issue?

It is neither. A query with a ? in it can be *anything*, so Postgres 
must design a plan that can handle it (in this case, a plan that 
does a sequential scan). It cannot know in advance if it the query 
is to be anchored or not. It's also not an option to expect DBI to 
keep track of what arguments you pass and create new plans based 
on the arguments.

When you do that prepare and execute, DBD::Pg is asking Postgres to 
create a prepared statement, such that it can send just the arguments, 
and not the full statement, each time execute() is called. You can 
force it to *not* do so by issuing:

$dbh-{pg_server_prepare} = 0;

You can also do this at the statement handle generation time: see the 
DBD::Pg docs for more information.

There are a lot of other considerations and things to think about when 
doing a LIKE $1 in Postgres, and in general the use of an unconstrained 
parameter indicates a need for something like full-text search, but the 
above attribute should allow you to see the same behavior on both of your 
original queries.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201202132246
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk852mcACgkQvJuQZxSWSsh/TACgkmvbALYR4WsvaglWn1nWD050
8jEAoPuY/Imn0eYOGll1+gWY8UHdKHwz
=UI4w
-END PGP SIGNATURE-




Re: same query, two different approaches, vastly different performance

2012-02-13 Thread David Nicol
On Mon, Feb 13, 2012 at 9:52 PM, Greg Sabino Mullane

 When you do that prepare and execute, DBD::Pg is asking Postgres to
 create a prepared statement, such that it can send just the arguments,
 and not the full statement, each time execute() is called. You can
 force it to *not* do so by issuing:

 $dbh-{pg_server_prepare} = 0;

 You can also do this at the statement handle generation time: see the
 DBD::Pg docs for more information.


That's awesome. Here's the answer to the PK's other question:

The proposed patch to Postgres would cause plan selection to be deferred,
in case of LIKE placeholder, until after pg can look at the
placeholder(s), to see if it's anchored at the front. The resulting trick
pg would generate 2**n plans at prepare time, where n is the number of
placeholders that may or may not be anchored at the front. Extra extra
points for figuring out that '^'||placeholder will always be so anchored,
so it can use a sorted index, even though it might have to filter the
results from that. With such a hack in place, and turned on by default,
there would be one fewer situation where the pg_server_prepare attribute
could be left on.

-- 
on the outside of the bottom of the barrel, that's how much of an outlier
it is.