Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Mike Matrigali Tue, 07 Apr 2009 12:05:33 -0700

It is impossible to say what the performance of the query can be without

knowing exact values of all the values of the IN LIST. But it ispossible to get some idea assuming some worst case behavior, and fromthat I am going to guess you will never come close to 100ms with anuncached database, on hardware using some sort of standard disk based

hard drive.


I do think the query may go faster with index and query tweeking, but
100ms to an uncached db and non-clustered unique values in that IN list
is never going to go that fast.  Adding up just what is posted it looks
like this is a 1.2 gig db.

You posted the space for the tables and indexes.  The interesting ones

are the big ones. You have 5 tables or indexes over 1000 pages big. Ifin the worst case your 1000 value IN list happens to be on 1000different pages then Derby is going to need to do at least 1000 i/o's toget to them - I usually use back of envelope of max 100 i/o's per second

(even if your disk has specs that say higher rate this I/O is not
going to
get streamed as fast as possible by this query, it is going to ask for
page, process it, do some join work then later ask for another page, ...) :

CATEGORY_MASTER            0    103    0    0    4096    0
SQL090406091302600        1    55    0    0    4096    0
SQL090406091302601        1    160    0    1    4096    0
SQL090406091302730        1    1    0    1    4096    0
OBJECT_MASTER            0    10497    0    0    4096    0
SQL090406091302760        1    5340    0    1    4096    0
SQL090406091302761        1    16708    0    410    4096    0
OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0

There was work done in 10.3 on IN-LISTS, making them perform more likeunions, See DERBY-47. So if you have a choice of releases I wouldsuggest you move to 10.4 and post query plan and results against that.

The basic idea of that change was to allow the
system to do 1 probe into an index for each value in the IN-LIST, before

this change DERBY could only sort the values in the IN list and thenlimit a index scan to the lowest and biggest values in the in list.

So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
have to scan 112177 pages to find the 1000 rows, where worst case for
probing would be 1000 page (plus btree parent index pages, but those
are much more likely cached).  The problem is that there is definitely
overhead for probing one at a time, scans go much faster - so there is
a crossover point - ie. I would guess it would likely better to scan all
112177 pages then do 100,000 probes.

arindam.bhattacharjee wrote:

Hello Knut,

Thanks for your quick response. This is a sample database which I have
created just for testing out the performance and has been written to only
once in one go. I tried temp tables but that is just too slow. The IN clause
has values which comes from another source and I can't modify that.

However, I will try out what you state below. But still, I wanted to get
your pulse about whether Derby can respond in sub 100 millisec time with the

table sizes you see above?

I find that:

select category_master.category_name, count(category_master.category_name)
as category_count
from
        (
                select internal.object_id
                from
                (
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1002) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all
                        values(1001) union all .......
                        values(9999)
                ) as internal(object_id)

        ) as external_ids,
        object_master,
        category_master,
        object_category_mapping
where
        external_ids.object_id = object_master.object_id and
        external_ids.object_id = object_category_mapping.object_id and
        object_master.object_id = object_category_mapping.object_id and
        category_master.category_id = object_category_mapping.category_id
group by
        category_master.category_name
order by
        category_count desc

is much faster unfortunately connection.prepareStatement() is taking way too
much memory (both stack and heap - I have a constraint of 256 MB MAX memory
for my JVM) which goes beyond my applications resources. Is there a way I
can precompile some SQLs which are very expensive to parse during execution.

Best regards,

Arindam.


Knut Anders Hatlen wrote:

"arindam.bhattacharjee" <[email protected]> writes:

Hello,

I would like my query below to return within 100 millisecs. Please help
me,
and the values for the IN clause comes from outside hence cannot really
change the IN clause to a join on an existing table.

Hi Arindam,

Does the query run faster if you compress all the tables involved, or if
you drop and recreate all the indexes? If so, it is likely that the
index cardinality statistics are out of date, which may make the
optimizer pick a bad execution plan. Currently, index cardinality
statistics are only updated at index creation time, when tables are
compressed, and when columns are dropped. A more automatic solution is
being worked on. For more details, see:

https://issues.apache.org/jira/browse/DERBY-269
https://issues.apache.org/jira/browse/DERBY-3788
http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html

You may be experiencing some other problem, but this is a problem that
keeps coming up, so I think it's worth checking.

Hope this helps,

--
Knut Anders

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Reply via email to