Re: same query, two different approaches, vastly different performance
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 query 2 is at least an order of magnitude slower than query 1 To follow up on this, the current best practice for handling this is to prepare two statements, and have your app use the correct one. One could be prepared with pg_server_prepare=0, and one as normal. That's a general setup that may not apply to your exact situation, but we can't tell what your actual arguments are and how often you call them based on your example. The idea is to create different statements (perhaps even more than two) to account for the different plans that are created, based on the information available to the planner. Your app can then pick the appropriate one to use based on what parameters are being sent. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202171158 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk8+iN4ACgkQvJuQZxSWSsjZ0ACdG+U7Ux7mdNs8NQ2W/8J/HYPA i1kAn0FSO5ukuG6mvbIuboq2iRvAlg2e =S3U1 -END PGP SIGNATURE-
Re: same query, two different approaches, vastly different performance
On Feb 17, 2012, at 11:05 AM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 query 2 is at least an order of magnitude slower than query 1 To follow up on this, the current best practice for handling this is to prepare two statements, and have your app use the correct one. One could be prepared with pg_server_prepare=0, and one as normal. Now that I know the reason behind this, thanks to all of you, I have decided to stick with inline params `LIKE $q` instead of bind values. I am not too worried about SQL attacks, and the above strategy works well without having to toggle `pg_server_prepare`. Also, I do expect the query to run often, but as the frontend is a web app, it will be called in separate sessions... so, think many reloads of the same page as opposed to one page request firing many instances of the same query. Many thanks Greg. That's a general setup that may not apply to your exact situation, but we can't tell what your actual arguments are and how often you call them based on your example. The idea is to create different statements (perhaps even more than two) to account for the different plans that are created, based on the information available to the planner. Your app can then pick the appropriate one to use based on what parameters are being sent. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202171158 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk8+iN4ACgkQvJuQZxSWSsjZ0ACdG+U7Ux7mdNs8NQ2W/8J/HYPA i1kAn0FSO5ukuG6mvbIuboq2iRvAlg2e =S3U1 -END PGP SIGNATURE-
Re: same query, two different approaches, vastly different performance
On Fri, Feb 17, 2012 at 9:18 AM, Puneet Kishor punk.k...@gmail.com wrote: On Feb 17, 2012, at 11:05 AM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 query 2 is at least an order of magnitude slower than query 1 To follow up on this, the current best practice for handling this is to prepare two statements, and have your app use the correct one. One could be prepared with pg_server_prepare=0, and one as normal. Now that I know the reason behind this, thanks to all of you, I have decided to stick with inline params `LIKE $q` instead of bind values. I am not too worried about SQL attacks, and the above strategy works well without having to toggle `pg_server_prepare`. Also, I do expect the query to run often, but as the frontend is a web app, it will be called in separate sessions... so, think many reloads of the same page as opposed to one page request firing many instances of the same query. You can be reasonably safe using inline params like that if you're careful to make sure $q contains only the sort of characters that make sense for your app, and/or if $q is quoted properly. You can use a regex to test that it is only alphanumeric for example. It's just easier to guarantee safety if you use bind values... but that's not always practical as you have discovered. -- Check out my LEGO blog at http://www.brickpile.com Follow/friend me: facebook.com/billward • flickr.com/photos/billward • twitter.com/williamward
Re: same query, two different approaches, vastly different performance
On Fri, 17 Feb 2012 11:18:39 -0600 Puneet Kishor punk.k...@gmail.com wrote: query 2 is at least an order of magnitude slower than query 1 Now that I know the reason behind this, thanks to all of you, I have decided to stick with inline params `LIKE $q` instead of bind values. I am not too worried about SQL attacks, and the above strategy works well without having to toggle `pg_server_prepare`. Also, I do expect the query to run often, but as the frontend is a web app, it will be called in separate sessions... so, think many reloads of the same page as opposed to one page request firing many instances of the same query. Unless you're cleaning up the input in some other way, I would suggest you use $dbh-quote() on $q before interpolating it into your query, like so: my $quoted = $dbh-quote( $q ); $dbh-do( ... LIKE $quoted ); It's a Good Idea--regardless of whether you're worried. :-) -- C. Chad Wallace, B.Sc. The Lodging Company http://www.lodgingcompany.com/ OpenPGP Public Key ID: 0x262208A0
Re: same query, two different approaches, vastly different performance
On Feb 17, 9:59 am, b...@wards.net (Bill Ward) wrote: You can be reasonably safe using inline params like that if you're careful to make sure $q contains only the sort of characters that make sense for your app, and/or if $q is quoted properly. You can use a regex to test that it is only alphanumeric for example. It's just easier to guarantee safety if you use bind values... but that's not always practical as you have discovered. Don't use a regex. Use the -quote() method. That's what it's for.
Re: same query, two different approaches, vastly different performance
On Feb 13, 4:46 pm, punk.k...@gmail.com (Puneet Kishor) wrote: I asked this on Stackoverflow and on Perlmonks, but hopefully I will get a more satisfactory and revealing insight straight from the DBI folks. I don't know why you need to ask this in three different forums. As you've been told now more than once, it has to do with preparing a query plan without know yet what the arguments are.
same query, two different approaches, vastly different performance
I asked this on Stackoverflow and on Perlmonks, but hopefully I will get a more satisfactory and revealing insight straight from the DBI folks. I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query, I get wildly different results. $q .= '%'; ## query 1 my $sql = qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE '$q' }; my $sth1 = $dbh-prepare($sql); $sth1-execute(); ## query 2 my $sth2 = $dbh-prepare(qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE ? }); $sth2-execute($q); query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index. And, yes, I have tried it with query 2 before query 1 to eliminate caching. There is a difference, but I don't know why, so would love hear an explanation. Many tia, -- Puneet Kishor
Re: same query, two different approaches, vastly different performance
On Feb 13, 2012, at 6:57 PM, David Nicol wrote: did you find http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations in your research? Thanks for that link, but I don't think that applies in my case. I have two exact queries, with a postfix wildcard search. One, I prepare with the search term embedded in the query. The other is prepared with a bind variable and the search term is added during $sth-execute. The first one uses the index. The second does a full-table scan. Strange, but true. -- Puneet Kishor
Re: same query, two different approaches, vastly different performance
On Feb 13, 2012, at 7:12 PM, Mike Blackwell wrote: Check the postgres-performance list archives. I believe the difference is due to the query planner having less info available with parameters vs inline args. a... that makes sense. I would like to confirm this, because, if true, then it is a strike against statements prepared with bind values. On Feb 13, 2012, at 19:07, Puneet Kishor punk.k...@gmail.com wrote: On Feb 13, 2012, at 6:57 PM, David Nicol wrote: did you find http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations in your research? Thanks for that link, but I don't think that applies in my case. I have two exact queries, with a postfix wildcard search. One, I prepare with the search term embedded in the query. The other is prepared with a bind variable and the search term is added during $sth-execute. The first one uses the index. The second does a full-table scan. Strange, but true. -- Puneet Kishor
Re: same query, two different approaches, vastly different performance
On Feb 13, 2012, at 7:59 PM, Mike Blackwell wrote: Yes. On the other hand, parameters are inherently safe from injection attacks and other data issues, so which to use depends on the situation. Yup, I am aware of this. In this case, definitely, there is a difference of a couple of hundred ms for a query vs. a couple of three seconds for the same query. So, I am going to stick with the inline params. I'm guessing you already know about EXPLAIN. There's an auto-explain extension in the contrib folder that might help you see what's going on as well. Yeah, I will dig into it to see if I can figure out something. However, I am not DBI-savvy enough to understand it all. Thankfully, after much finagling, trials, errors, and various different kinds of indexes, I am now getting the results in usable amount of time. On Feb 13, 2012, at 19:16, Puneet Kishor punk.k...@gmail.com wrote: On Feb 13, 2012, at 7:12 PM, Mike Blackwell wrote: Check the postgres-performance list archives. I believe the difference is due to the query planner having less info available with parameters vs inline args. a... that makes sense. I would like to confirm this, because, if true, then it is a strike against statements prepared with bind values. On Feb 13, 2012, at 19:07, Puneet Kishor punk.k...@gmail.com wrote: On Feb 13, 2012, at 6:57 PM, David Nicol wrote: did you find http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations in your research? Thanks for that link, but I don't think that applies in my case. I have two exact queries, with a postfix wildcard search. One, I prepare with the search term embedded in the query. The other is prepared with a bind variable and the search term is added during $sth-execute. The first one uses the index. The second does a full-table scan. Strange, but true. -- Puneet Kishor
Re: same query, two different approaches, vastly different performance
On Mon, Feb 13, 2012 at 7:16 PM, Puneet Kishor a... that makes sense. I would like to confirm this, because, if true, then it is a strike against statements prepared with bind values. the postgres reference about when LIKE statements get to use indices states that they are only available for /constants/ ending with %. It seems like this would be simple enough to fix, if you want to be a hero and get a patch accepted into Postgres. It's in the fourth paragraph at http://www.postgresql.org/docs/8.4/static/indexes-types.html Also there is interesting stuff about creating indices for functions, so you can index on lower(a) and then queries using lower(a) will get optimized. After reading that bit of the postgres docs I wonder if doing something like lower(a) like ('^' || ?) would allow the index to be used. Probably not, as the result would not be a constant, even though it would be anchored to the beginning.
Re: same query, two different approaches, vastly different performance
ok. I am completely lost here. On Feb 13, 2012, at 8:10 PM, David Nicol wrote: On Mon, Feb 13, 2012 at 7:16 PM, Puneet Kishor a... that makes sense. I would like to confirm this, because, if true, then it is a strike against statements prepared with bind values. the postgres reference about when LIKE statements get to use indices states that they are only available for /constants/ ending with %. It seems like this would be simple enough to fix, if you want to be a hero and get a patch accepted into Postgres. It's in the fourth paragraph at http://www.postgresql.org/docs/8.4/static/indexes-types.html why should a patch be required in Postgres? Seems to extremely naive pov that this is an issue with DBI. Also there is interesting stuff about creating indices for functions, so you can index on lower(a) and then queries using lower(a) will get optimized. After reading that bit of the postgres docs I wonder if doing something like lower(a) like ('^' || ?) would allow the index to be used. Probably not, as the result would not be a constant, even though it would be anchored to the beginning. So, the index I have created is as follows CREATE INDEX idxa ON t (Lower(a) varchar_pattern_ops); This is a web app (although I have done tests using standalone, non-web, scripts as well). The user types a few characters that are received in $q. I then do $q = lc($q) . '%'; followed by my $sql = qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE '$q'}; my $sth = $dbh-prepare($sql); $sth-execute(); The above takes, say, 200-400 ms depending on the term. Or, in the case of the query that is really slow my $sth = $dbh-prepare(qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE ?}); $sth-execute($q); The above takes 3 to 4 seconds. Pretty much identical queries with identical results by vastly difference performance. Isn't this a DBI issue instead of a Postgres issue? -- Puneet Kishor
Re: same query, two different approaches, vastly different performance
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 my $sth = $dbh-prepare(qq{SELECT a, b, c FROM t WHERE Lower(a) LIKE ?}); $sth-execute($q); The above takes 3 to 4 seconds. Pretty much identical queries with identical results by vastly difference performance. Isn't this a DBI issue instead of a Postgres issue? It is neither. A query with a ? in it can be *anything*, so Postgres must design a plan that can handle it (in this case, a plan that does a sequential scan). It cannot know in advance if it the query is to be anchored or not. It's also not an option to expect DBI to keep track of what arguments you pass and create new plans based on the arguments. When you do that prepare and execute, DBD::Pg is asking Postgres to create a prepared statement, such that it can send just the arguments, and not the full statement, each time execute() is called. You can force it to *not* do so by issuing: $dbh-{pg_server_prepare} = 0; You can also do this at the statement handle generation time: see the DBD::Pg docs for more information. There are a lot of other considerations and things to think about when doing a LIKE $1 in Postgres, and in general the use of an unconstrained parameter indicates a need for something like full-text search, but the above attribute should allow you to see the same behavior on both of your original queries. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202132246 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk852mcACgkQvJuQZxSWSsh/TACgkmvbALYR4WsvaglWn1nWD050 8jEAoPuY/Imn0eYOGll1+gWY8UHdKHwz =UI4w -END PGP SIGNATURE-
Re: same query, two different approaches, vastly different performance
On Mon, Feb 13, 2012 at 9:52 PM, Greg Sabino Mullane When you do that prepare and execute, DBD::Pg is asking Postgres to create a prepared statement, such that it can send just the arguments, and not the full statement, each time execute() is called. You can force it to *not* do so by issuing: $dbh-{pg_server_prepare} = 0; You can also do this at the statement handle generation time: see the DBD::Pg docs for more information. That's awesome. Here's the answer to the PK's other question: The proposed patch to Postgres would cause plan selection to be deferred, in case of LIKE placeholder, until after pg can look at the placeholder(s), to see if it's anchored at the front. The resulting trick pg would generate 2**n plans at prepare time, where n is the number of placeholders that may or may not be anchored at the front. Extra extra points for figuring out that '^'||placeholder will always be so anchored, so it can use a sorted index, even though it might have to filter the results from that. With such a hack in place, and turned on by default, there would be one fewer situation where the pg_server_prepare attribute could be left on. -- on the outside of the bottom of the barrel, that's how much of an outlier it is.