Simon,
Thanks for the feedback.
I don't have all the details of the schema and workload. Just an
interesting idea that was presented to me.
I think the idea is to split a lengthy secondary key lookup into 2 primary
key lookups and reduce the cost of clustering secondary key with primary
key data by using a shorter INT type surrogate key. Another downside is
the possible need of foreign keys and added complexity of insertions and
multi-column updates.

Have you found primary key lookups to be at least twice as fast as
secondary key lookups with VARCHAR type primary key in InnoDB? The whole
idea is based on the assumption that it is.

Also, MyISAM conversion is an option too. Have you found the table
maintenance to be a significant overhead? I've experienced MyISAM table
corruptions in production and I'm more inclined to go with InnoDB for its
reliability. This is a fairly important table.

Any insight would be much appreciated.
Kyong

> kimky...@fhda.edu ("Kyong Kim") writes:
>
>> I was wondering about a scale out problem.
>> Lets say you have a large table with 3 cols and 500+ million rows.
>>
>> Would there be much benefit in splitting the columns into different
>> tables
>> based on INT type primary keys across the tables?
>
> To answer your question properly requires more information:
>
> 1. Expected table structure. Can you show the current CREATE TABLE xxx\G
> output?
> 2. Expected use cases to extract data?
> 3. Do you expect to delete data frequently, or are you only inserting
> data,
>    or is there a mix of inserts and deletes? If so provide more info.
>
> I've come across situations where a large table like this caused lots
> of problems. There were lots of concurrent "delete batches" (cleaning
> up) and at the same time lot of inserts. At the same time there were
> large groups of selects to collect certain sets of data for
> presentation. Perhaps you are doing something similar? If you do
> something similar you may find that it's extremely important to get
> the keys right especially the primary keys so that data retrieval (for
> SELECTs or DELETEs) is as fast as possible (using clustered indexes
> [PRIMARY KEY in innodb]). If not or if the queries overlap you may
> find performance degredation a big issue as Innobase manages the locks
> to ensure that the concurrent statements don't interfere.
>
> You can also use merge tables sitting on top of MyISAM "per year" or
> "per whatever" data in each table. That avoids you having to find data
> for 2009 as you look in table xxx_2009, so this can be a big
> win. MyISAM has the inconvenience that if the server ever crashes
> recovery of these tables can be very timeconsuming.  Innodb has a
> larger footprint for the same data.
>
> So it's hard without more information on the structure and the use
> cases to answer your question. In fact if you have the time, try out
> and benchmark different approaches and see which is best for your
> requirements. Just remember that as the data grows the initial
> measurements may not be consistent with behaviour you see later. Also
> if you are looking at a large amount of data like this appropriate
> server tuning can influence performance significantly.
>
> Hope this helps.
>
> Simon
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=kimky...@fhda.edu
>


Inst. Web Programmer
CMDBA 5.0

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to