On 03.05.2006 10:53, Bealach Na Bo wrote:

Thank you all for your comments. I think a temporary table is
unavoidable if I want to control the way the query is done here (as
opposed to letting the planner do its job).

The @names array is the result of an earlier DBI query.


So why don't you just combine the earlier query and this query into a new query?

If @names comes from something like "SELECT something as names FROM a lot of tables WHERE a lot of conditions match", merge it with the query you posted earlier:


    SELECT j.job_log_id
    FROM job_log j, node n
WHERE n.name in ( SELECT something as names FROM a lot of tables WHERE a lot of conditions match )
    AND   n.type = ?
    AND   n.usage = ?
    AND   n.node_id = j.node_id

This seems to be the cleanest and most natural way to me. It even allows the query optimizer of the Database do do its job better than with two queries that don't seem to be related to each other to the query optimizer.

Temporary tables are not needed here (and cause more trouble than needed) -- unless @names comes from another database with no ability to link to the database you wand to query "now". If, for example, we talk about two Oracle DBs, even on different machines, even with different versions of Oracle, even with different Operating systems, and even on different platforms, there is a way to connect both databases so that the contents of one appears to be in the other. In that case, you should ask an Oracle expert for more details about "database links". (I know *this* setup from my former job, but I never had to work with it, so I lack some knowledge here.) I think MSSQL has a similar capability. For MySQL and PostgreSQL, this should at least not be impossible. I would recommend to RTFM.


I agree, a Perl snippet to generate the scalar @names place holders
would probably push the query over max query size, as well as various
other name space limits. Probably a bad idea.

And iterating over the array and issuing scalar @names queries is not
something I want to try.

So I'm left with temporary tables.

No. Rethink the way you issue the queries. You would be stuck with temporary tables if @names came from a plain file or a device, but not if @names comes from the same or a similar database.

Alexander


Bealach



On 5/2/06, Alexander Foken <[EMAIL PROTECTED]> wrote:

If @names was a few records, you could write something like
    select ... where name in (?,?,?,?,?)
using some Perl code to generate the placeholders ( join(',', ('?') x
scalar @names) ). But with 100,000 records, i think this would bring
DBI, DBD::Pg and the Pg SQL parser to their limits. Usually, I strongly
oppose to using temporary tables. But in your case, I would create a
temporary table (using some unique identifier, like hostname + PID +
random number, as the table name), insert @names into that table, and
then let Pg do the job:

    create table temptable2744143 ....;

    insert into temptable2744143  (name) values (?);
       (for each record in @names)

     SELECT j.job_log_id
     FROM job_log j, node n , temptable2744143 t
     WHERE n.name = t.name
     AND   n.type = ?
     AND   n.usage = ?
     AND   n.node_id = j.node_id;

     drop table temptable2744143;


Where does @names come from?

Alexander

On 01.05.2006 14:21, Bealach Na Bo wrote:

> Hi folks,
>
> Apologies if this has been asked and resolved - I've spent half a day
> searching various archive, but have not found anything.
>
> My problem is the following:
>
> I have a very large postgresql table (ca. 3 mil records) and need to
> replace a nested sql query with an "itarative" one so I end up with:
>
>
>
> @names is a very large array - of the order of 100000 elements.
>
>
> sub prep_sql {
>    my $stmt = <<EOSQL;
>      SELECT j.job_log_id
>      FROM job_log j, node n
>      WHERE n.name in ?       <============ (@names) should be here
>
>          DBI documentations says
>
>          this must be a scalar
>      AND   n.type = ?
>      AND   n.usage = ?
>      AND   n.node_id = j.node_id
> EOSQL
>
>    my @bind = (@names,'B','LIVE'); <=========can't use @names here
>    return ($stmt,@bind);
> } # end sub prep_sql
>
>
> I know how to use sql subqueries to do this in one go in postgresql,
> but I think this might be a lot faster?
>
> Has anyone tried the same sort of thing?
>
> Regards,
>
> Bealach
>


--
Alexander Foken
mailto:[EMAIL PROTECTED]  http://www.foken.de/alexander/





--
Alexander Foken
mailto:[EMAIL PROTECTED]  http://www.foken.de/alexander/

Reply via email to