I would also consider a DB for this... 10M and 2 columns is not a lot of data so I would look to have it in memory with some DB index or memory hash for querying. (We are keeping the indexes of tables with 150M records, 30M and 10M and joining between them with around 25 indexes on the 150M table - on a single machine with 64G ram - mysql)
I would spend a day trying some testing on H2 and mysql in myisam mode which is a fast DB or try their in-memory DB - we have seen 600 "query+update" operations per second at peak - so this is only 3 machines to do 10M in 10 minutes. H2 in embedded mode is also impressively fast and I think there is a memory only mode also. Are you able to build the new version offline? Have you considered writing the new version to simple CSV file and loading into a DB. This is a very fast operation also. Make sure you use the smallest possible type you can for the data to improve performance - if a TINYINT is enough, then use it. The difference will be huge. It also depends on how you query your data as to what indexes you need built and what the load will be - perhaps you can describe your problem more? Cheers Tim On Wed, Dec 24, 2008 at 11:45 AM, Steve Loughran <[email protected]> wrote: > aakash shah wrote: >> >> We can assume that this record has only one key->value mapping. Value will >> be updated every minute. Currently we have 1 Million these ( key->value ) >> pairs but I have to make sure that we can scale it upto 10 million of these >> ( key-> value ) pairs. >> >> Every 10 minute I will be updating all of these value using their keys. >> This is the reason I cannot go for database as a solution. > > I wouldn't be so quick to dismiss a database. All your big telcos run their > mobile phone systems on databases, where the big issue is having enough > memory for the DB to stay in memory; some dedicated databases (e.g. > TimesTen) are designed to have bounded latency on lookup so you can predict > how long operations will take. > > That said, if you are only doing atomic updates of a single record, there's > less need for the advanced features. Assuming >1 machine, some kind of > distributed hash table may work > > >> I was thinking about going with memcache pool. In the mean-time I heard >> about hadoop and wanted to get advice from this mailing list regarding >> memcache pool vs hadoop for this specific problem. > > It's not an area Hadoop deals with at all. >
