At 08:06 PM 7/12/2009, Morten wrote:

Mike, you're right - sorry.

I've been reading High Performance MySQL today and got some great tips
from that which will help a lot.

Yes it is a good book. I hope you have the 2nd edition.

 I think the fundamental challenge
now, is that the table contains a lot of timestamps, and querying
against these involves multiple range queries which makes indexing hard.

If you can get rid of the DateTime and switch to just Date it may speed up the indexes.


The "actions" table has the following columns (of relevance to the
example):

  status_id
  assignee_id
  company_id
  created_at
  assigned_at
  opened_at
  updated_at
  verified_at
  due_at
  solved_at
  closed_at

Queries could be:

  "Show all actions which are assigned to Tom, were created in
October and solved in November"
  "Show all open actions which were opened before August, do not have
an assignee and were verified last week"

These queries which involve easily indexable fields (status_id,
assignee_id, company_id) and multiple conditions on different ranges
are what's difficult. The table is about 2.500.000 records and grows
at a daily rate of about 50.000 records (that number is growing
though). Once an action has been closed, it gets status "closed" and
is no longer of interest. 70% of the records in the table will be
status "closed".

So why not have 2 tables: "Cases_Active" for the cases currently open, and "Cases_Closed". This will greatly reduce the number of rows in the table and the depth of the index tree. Your application will select the appropriate table if the question mentions "Active" or "Closed" cases. You can also define a Merge table "Cases_All" that will logically combine both tables so you could do a query on both closed and active cases if you need to, or just do a Union on the two tables via 2 separate queries.

You didn't mention how many queries per second you need to handle. A couple hundred queries per second can be handled by MyISAM because it has excellent query cache that can be tweaked. You can also load the indexes into memory if needed.

The main problem as you mentioned is trying to index all of the date fields. I don't think this is necessary. I noticed in 5.1 MySQL (because it was rushed out the door) does not always use the correct index and I often have to force it to use the proper index with "Force Index". So you need to use the Explain on your slow queries (make sure you log them) and find out which index they are using.

You can use a Memory table but they don't work with Merge tables. I would only consider this if the table gets updates every second or two and that flushes the cache. I'm not sure how much money you're willing to throw at this project, but I know of some additional hardware that can squeeze out more speed.

I think what I'm looking for now, is some way to encode the different
date values into a single column which can be indexed and the value of
which gets calculated and updated by a background job. This will cost
some precision, but I hope that can be done. Otherwise I'm back to
considering alternative index/query-mechanisms.

Does my problem make a little more sense now? Thanks.

Yup!  :-)

Mike


Morten



Let's say I would like to see all actions that were created in october
and solved in november.


On Jul 12, 2009, at 3:54 PM, mos wrote:

Morten,
      Perhaps you could also add how many rows are in the table, how
many rows are added each day, what are the column types, and what do
the search queries look like?

Mike

At 11:39 AM 7/12/2009, Morten wrote:

Hi,

I'm working on a table that has about 12 columns against which
arbitrary queries must perform really well. Currently there are a lot
of indexes on the table, but I'm hitting some problems - and adding
more indexes seems a slippery slope (there are ~15 multi-column
indexes, I'd like that reduced).

So I'm looking for a way out and I'm currently considering:

* Building a memory table on top of the existing table
* Sphinx indexing and then throw the queries against Sphinx instead
* Using a different "in-memory-DB" like Tokyo Cabinet for the queries
* Building a series of "reporting tables" which each handle a subset
of the supported queries

All of the solutions would maintain the current table for consistency
and it's acceptable with a couple of minutes lag.

I'm tempted to go for the memory table and update that depending on
which rows have been updated in the parent table since last update.
Eliminating duplicates could be a challenge, unless I build a new
table for each update and then "rename" the tables - but that's
costly
in terms of memory.

What do people usually do in this situation? Any other solutions to
consider?

Thanks,

Morten



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=my.li...@mac.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to