Hi Tomislav,

Some comments inline...

Tomi N/A wrote:
Hi everyone,

first of all, it's great to have a RDBMS like derby around!
Now, on to the 2 problems I'm trying to resolve...

Situation: 1 table, 15+ million rows, 4 integer values (id, id_group1,
id_group2, id_group3), 750MB size on disk, derby in network mode

Problem 1:

select id_group, count(id)
group by id_group
Not to be tiresome, but just to make sure we understand the query: I think you mean id_group1 instead of id_group.
This takes a couple of minutes, basically regardless of what I tried
with indexes. I'd like it to take <10s if possible.

Problem 2:

It seemed to me that the query triggers a sequential scan (I see a lot
of disk activity) and so I created a disk in memory and restored the
database to this disk, expecting the query to be close to
instantaneous. Better, but still horrible (80 sec).
Which brings me to the crux of the 2nd problem: this in-memory
exercise was without indexes, so I tried to create some (id_group1 and
id for a start). However, the indexes seem to be _huge_ at 300-400MB
each and spike at about 700MB before the index is completely created -
I was quite surprised to see the indexes ttake more than several dozen
MB.
So, is there a way to reduce index size? And would any kind of indexes
help the type of query I'm interested in?
It sounds as though you may have created an index on ( id_group1, id ). At first blush, I would expect that index to improve the performance. Did that not happen?

Thanks,
-Rick
(Machine: Ubuntu Jaunty, 1.7GHz Pentium M, 2GB RAM, 5400 rpm disk)

Thanks,
Tomislav

--
www.PanBI.org: business intelligence everywhere!

Reply via email to