not sure if the in mem approach will provide the equivalent to full text
indexing....but worth a try. Another design pattern is to just install one
DB and have all the nodes connect. I have done this with Postgres on a
40ish node hadoop cluster. The queries against the db's full text index are
not that expensive for mysql, it's not a complex query, just a seek on the
full text index.  But, of course, it depends on how much concurrency it
will get, which depends on how much data, nodes, and tasks you have....
Generically I think the right answer is to be able to configure the
connection behind the GeoEntityLinker... in mem || remote db || locahost db



On Wed, Oct 23, 2013 at 8:46 AM, Jörn Kottmann <[email protected]> wrote:

> On 10/23/2013 01:14 PM, Mark G wrote:
>
>> All that being said, it is totally possible to run an in memory version of
>> the gazateer. Personally, I like the DB approach, it provides a lot of
>> flexibility and power.
>>
>
> Yes, and you can even use a DB to run in-memory which works with the
> current implementation,
> I think I will experiment with that.
>
> I don't really mind using 3 GB memory for it, since my Hadoop servers have
> more than enough anyway,
> and it makes the deployment easier (don't have to deal with installing
> MySQL
> databases and keeping them in sync).
>
> Jörn
>

Reply via email to