Most efficient way of handling a large dataset

Mark Goodge Fri, 24 Oct 2008 03:59:45 -0700

I'd appreciate some advice on how best to handle a biggish datasetconsisting of around 5 million lines. At the moment, I have a singletable consisting of four fields and one primary key:


partcode varchar(20)
region varchar(10)
location varchar(50)
qty int(11)
PRIMARY KEY (partcode, region, location)

The biggest variable is partcode, with around 80,000 distinct values.For statistical purposes, I need to be able to select a sum(qty) basedon the other three fields (eg, "select sum(qty) from mytable wherepartcode ='x' and region = 'y' and location = 'z'") as well asgenerating a list of partcodes and total quantities in each region andlocation (eg, "select sum(qty), partcode from mytable where region = 'y'and location = 'z' group by partcode").

The selection is done via a web-based interface. Unfortunately, it's tooslow. So I want to be able to optimise it for faster access. Speed ofupdating is less crucial, as it isn't updated in real-time - the tablegets updated by a nightly batch job that runs outside normal workinghours (and, apart from the rare occasion when a location is added orremoved, the only thing that changes is the value in qty).

Does anyone have any suggestions? My initial thought is to replace theregion and location varchar fields with int fields keyed to a separatelist of region and location names. Would that help, or is there a betterway?


Mark
--
http://mark.goodge.co.uk - my pointless blog
http://www.good-stuff.co.uk - my less pointless stuff

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Most efficient way of handling a large dataset

Reply via email to