One more alternative, though I am not sure if anyone
is using it.
Apache Compass has added a plug-in to allow storing
Lucene index files inside the database. This should
work in clustered environment as all nodes will share
the same database instance.
I am not sure the impact it will have on performance.
Is anyone using DB for index storage? Any drawbacks of
this approach?
Regards,
Rajesh
--- Zach Bailey <[EMAIL PROTECTED]> wrote:
> Thanks for your response --
>
> Based on my understanding, hadoop and nutch are
> essentially the same
> thing, with nutch being derived from hadoop, and are
> primarily intended
> to be standalone applications.
>
> We are not looking for a standalone application,
> rather we must use a
> framework to implement search inside our current
> content management
> application. Currently the application search
> functionality is designed
> and built around Lucene, so migrating frameworks at
> this point is not
> feasible.
>
> We are currently re-working our back-end to support
> clustering (in
> tomcat) and we are looking for information on the
> migration of Lucene
> from a single node filesystem index (which is what
> we use now and hope
> to continue to use for clients with a single-node
> deployment) to a
> shared filesystem index on a mounted network share.
>
> We prefer to use this strategy because it means we
> do not have to have
> two disparate methods of managing indexes for
> clients who run in a
> single-node, non-clustered environment versus
> clients who run in a
> multiple-node, clustered environment.
>
> So, hopefully here are some easy questions someone
> could shed some light on:
>
> Is this not a recommended method of managing indexes
> across multiple nodes?
>
> At this point would people recommend storing an
> individual index on each
> node and propagating index updates via a JMS
> framework rather than
> attempting to handle it transparently with a single
> shared index?
>
> Is the Lucene index code so intimately tied to
> filesystem semantics that
> using a shared/networked file system is infeasible
> at this point in time?
>
> What would be the quickest time-to-implementation of
> these strategies
> (JMS vs. shared FS)? The most robust/least
> error-prone?
>
> I really appreciate any insight or response anyone
> can provide, even if
> it is a short answer to any of the related topics,
> "i.e. we implemented
> clustered search using per-node indexing with JMS
> update propagation and
> it works great", or even something as simple as
> "don't use a shared
> filesystem at this point".
>
> Cheers,
> -Zach
>
> testn wrote:
> > Why don't you check out Hadoop and Nutch? It
> should provide what you are
> > looking for.
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
>
>
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the
tools to get online.
http://smallbusiness.yahoo.com/webhosting
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]