not sure if the in mem approach will provide the equivalent to full text indexing....but worth a try. Another design pattern is to just install one DB and have all the nodes connect. I have done this with Postgres on a 40ish node hadoop cluster. The queries against the db's full text index are not that expensive for mysql, it's not a complex query, just a seek on the full text index. But, of course, it depends on how much concurrency it will get, which depends on how much data, nodes, and tasks you have.... Generically I think the right answer is to be able to configure the connection behind the GeoEntityLinker... in mem || remote db || locahost db
On Wed, Oct 23, 2013 at 8:46 AM, Jörn Kottmann <[email protected]> wrote: > On 10/23/2013 01:14 PM, Mark G wrote: > >> All that being said, it is totally possible to run an in memory version of >> the gazateer. Personally, I like the DB approach, it provides a lot of >> flexibility and power. >> > > Yes, and you can even use a DB to run in-memory which works with the > current implementation, > I think I will experiment with that. > > I don't really mind using 3 GB memory for it, since my Hadoop servers have > more than enough anyway, > and it makes the deployment easier (don't have to deal with installing > MySQL > databases and keeping them in sync). > > Jörn >
