I agree with Jonathan Gray about the two tables to have a random access to both what achievement's a player has got and ALL players that have x achievement's you will need two tables

Things you should look in to changing about your setup to be more efficient would be to only use a player table and cache the other stuff and use MR jobs if you do not need random access to it

Example Do you reality need access to this query?
What players have a given achievement?
This data can be mined form the player table with a MR job for updating stats or emailing or what every you use it for internally. If not provided to players but you need it for something internal and can wait a few hours for it then a MR job can provide this data. This is the query you will need to make the achievement table and will require you to update both the players and achievement table on changes to keep everything in sync If you reality have to have this query I would still thank about going with a cached version of it that gets updated every X hours/days depending on resources.

Who are the first 25 people to have a given achievement?
This could be cached somehow on local disk sense it will never change once 25 have made the achievement.

What are all the possible achievements?
This should be easy to cache outside hbase also sense you said the new achievement will not be common.
Just have to update it if you add new achievement's

What achievements does a given player NOT have?
Using a cache from above you could pull a players achievement's from the players table and remove the ones he has and use what's left as Jonathan Gray pointed to

With the above suggestions they should help to scale with less resources.

Billy Pearson

"Matthew Runo" <[email protected]> wrote in message news:[email protected]...
Hello folks!

I'm running a site (http://www.realmranker.com) that works on medium sized
data sets from the World of Warcraft armory. I'm currently doing everything in MySQL on a "Large" EC2 instance, but hitting some budgetary limits in how
much more data I can put there.

First, here is my current MySQL architecture:

player - table that holds the player data, currently about 500,000 rows
(expecting 2,000,000+ rows long term)
achievement - table that holds the root achievement data, currently a few
1000 rows, not expecting all that much growth
player_achievement - table that maps a player to all the achievement's
they've completed. Currentl about 60,000,000 rows, expecting 250,000,000+
long term

Here is the layout of player_achievment right now:
CREATE TABLE `player_achievement` (
 `playerAchievementId` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `playerId` int(11) unsigned NOT NULL,
 `achievementId` int(11) unsigned NOT NULL,
 `guildId` int(11) unsigned DEFAULT NULL,
 `dateCompleted` varchar(100) NOT NULL,
 PRIMARY KEY (`playerAchievementId`),
 UNIQUE KEY `playerId` (`playerId`,`achievementId`),
 KEY `playerId_2` (`playerId`),
 KEY `achievementId` (`achievementId`),
 KEY `guildId` (`guildId`),
 KEY `dateCompleted` (`dateCompleted`)
)

I'd like to move this table into HBase, since it's simply getting too large
for the amount of server resources I can afford. I assume that HBase would
preform faster than MySQL on my (small by HBase standards) dataset.

My question is simply what is the best way to do this? I'm assuming
something like this:

player_achievement table
achievementId row
players family
player1, player2, player3.....playerN columns
date completed would be the data under each column

I need to do the following types of queries:
- What players have a given achievement?
- What achievements does a give player have?
- What achievements does a given player NOT have?
- What are all the possible achievements?
- Who are the first 25 people to have a given achievement?

I'm sorry for the really high level of this email - I'm still getting up to
speed on HBase, and not quite sure it's exactly the answer to my prayers
that I think it might be..

--Matthew Runo



Reply via email to