On 4/20/07, Danny Burkes <[EMAIL PROTECTED]> wrote:
> Hi-
>
> I'm technical lead at Lingr (http://www.lingr.com), a chatroom-based
> social networking site.  We've currently got several million user
> utterances stored in MySQL, and we're looking to build a local search
> functionality.  I've played around with aaf and I really like it, but I
> have some questions.
>
>
> 1.  Is anyone out there using aaf to index a corpus of this size?  If
> so, how has your scaling experience been?

Yes. I have server models with more the 4M rows, all indexed with AAF.
My experience has been that AAF is very stable. Most of my challenges
have been with ferret upgrades breaking index format.

> 2.  We would be running one central aaf server instance, talking to it
> over drb from our many application servers.  We add tens of thousands of
> utterances per day- anyone out there indexing this many items on a daily
> basis over drb?  If so, how has your experience been in terms of
> stability?

Yes. Rock solid.

> 3.  All of our utterance data is in UTF8, but we don't know what
> language a particular utterance is in.  It's common to have both latin
> and non-latin text even in the same room.  How can I index both types of
> strings effectively within the same model field index?

Why not just use UTF8?

> 4.  Any suggestions on how to build the initial index in an offline way?
> I suspect it will probably take many hours to build the initial index.

Jens has talked about developing a better rebuild_index for AAF that does this.

However, if your search system isn't online (ie, the feature isn't
enabled in the front end), why would you need anything special? The
AAF DRb server can server requests while you're running a rebuild (as
long as you don't use the current rebuild_index method).

> 5.  I suspect we will have to disable_ferret(:always) on our utterance
> model, then update the index manually on some periodic basis (cron job,
> backgroundrb worker, etc.).  The reason for this is that we don't want
> to introduce any delay into the process of storing a new utterance,
> which occurs in realtime during a chat session.  Anyone have experience
> doing this?

It's pretty fast. The only time you'd see a slowdown is when you
encounter a lock in the DRb server.

-ryan
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to