Re: Need advice on a good setup for "generic queries"

mos Sun, 12 Jul 2009 21:14:12 -0700

At 08:06 PM 7/12/2009, Morten wrote:

Mike, you're right - sorry.


I've been reading High Performance MySQL today and got some great tips
from that which will help a lot.


Yes it is a good book. I hope you have the 2nd edition.

 I think the fundamental challenge
now, is that the table contains a lot of timestamps, and querying
against these involves multiple range queries which makes indexing hard.

If you can get rid of the DateTime and switch to just Date it may speed upthe indexes.

The "actions" table has the following columns (of relevance to the
example):

  status_id
  assignee_id
  company_id
  created_at
  assigned_at
  opened_at
  updated_at
  verified_at
  due_at
  solved_at
  closed_at

Queries could be:

  "Show all actions which are assigned to Tom, were created in
October and solved in November"
  "Show all open actions which were opened before August, do not have
an assignee and were verified last week"

These queries which involve easily indexable fields (status_id,
assignee_id, company_id) and multiple conditions on different ranges
are what's difficult. The table is about 2.500.000 records and grows
at a daily rate of about 50.000 records (that number is growing
though). Once an action has been closed, it gets status "closed" and
is no longer of interest. 70% of the records in the table will be
status "closed".

So why not have 2 tables: "Cases_Active" for the cases currently open, and"Cases_Closed".This will greatly reduce the number of rows in the table and the depth ofthe index tree. Your application will select the appropriate table if thequestion mentions "Active" or "Closed" cases. You can also define a Mergetable "Cases_All" that will logically combine both tables so you could do aquery on both closed and active cases if you need to, or just do a Union onthe two tables via 2 separate queries.

You didn't mention how many queries per second you need to handle. A couplehundred queries per second can be handled by MyISAM because it hasexcellent query cache that can be tweaked. You can also load the indexesinto memory if needed.

The main problem as you mentioned is trying to index all of the datefields. I don't think this is necessary. I noticed in 5.1 MySQL (because itwas rushed out the door) does not always use the correct index and I oftenhave to force it to use the proper index with "Force Index". So you need touse the Explain on your slow queries (make sure you log them) and find outwhich index they are using.

You can use a Memory table but they don't work with Merge tables. I wouldonly consider this if the table gets updates every second or two and thatflushes the cache. I'm not sure how much money you're willing to throw atthis project, but I know of some additional hardware that can squeeze outmore speed.

I think what I'm looking for now, is some way to encode the different
date values into a single column which can be indexed and the value of
which gets calculated and updated by a background job. This will cost
some precision, but I hope that can be done. Otherwise I'm back to
considering alternative index/query-mechanisms.

Does my problem make a little more sense now? Thanks.


Yup!  :-)

Mike

Morten



Let's say I would like to see all actions that were created in october
and solved in november.


On Jul 12, 2009, at 3:54 PM, mos wrote:

Morten,
      Perhaps you could also add how many rows are in the table, how
many rows are added each day, what are the column types, and what do
the search queries look like?

Mike

At 11:39 AM 7/12/2009, Morten wrote:

Hi,

I'm working on a table that has about 12 columns against which
arbitrary queries must perform really well. Currently there are a lot
of indexes on the table, but I'm hitting some problems - and adding
more indexes seems a slippery slope (there are ~15 multi-column
indexes, I'd like that reduced).

So I'm looking for a way out and I'm currently considering:

* Building a memory table on top of the existing table
* Sphinx indexing and then throw the queries against Sphinx instead
* Using a different "in-memory-DB" like Tokyo Cabinet for the queries
* Building a series of "reporting tables" which each handle a subset
of the supported queries

All of the solutions would maintain the current table for consistency
and it's acceptable with a couple of minutes lag.

I'm tempted to go for the memory table and update that depending on
which rows have been updated in the parent table since last update.
Eliminating duplicates could be a challenge, unless I build a new
table for each update and then "rename" the tables - but that's
costly
in terms of memory.

What do people usually do in this situation? Any other solutions to
consider?

Thanks,

Morten



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]

Re: Need advice on a good setup for "generic queries"

Reply via email to