Marcel,
Im replying to the list rather than Jira, since this is OT wrt JCR-169.
So, if you have 50x200MB of Lucene index... for example.... and wanted
that to be accessible in a cluster environment, would Jackrabbit be a
good place to put those segments ?
The big killer for Lucene is the ability to seek efficiently on the
central blob (I think), but presumably by choosing the right Binary
storage strategy that comes partially for free ?
If this is the case, I could replace my, slightly odd, segment
distribution mechanism with Jackrabbit.
Last question,
Is JCR-169 being actively worked on ?
Is there an area where another pair of hands would help... I would like
to be able to deploy Jackrabbit in a cluster.
Ian
Marcel Reutegger (JIRA) wrote:
[ http://issues.apache.org/jira/browse/JCR-169?page=comments#action_12432083 ]
Marcel Reutegger commented on JCR-169:
--------------------------------------
Ian, thanks a lot for your comments.
Here are my current thoughts on clustering the search index in jackrabbit:
I think the prefered approach is to put the index into the repository itself.
See: http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/8530 and
following messages
This would also allow us to distribute index updates to cluster nodes using the
repository internal observation mechanism. e.g. the update of a deleted
documents file or new index segments.
I found the best indexing strategy was to have local copies of segments, stored
centrally as masters.
I agree. Specifically the design of lucene where index files are only created
but never modified supports this approach very nicely.
Im the search application, speed of update of segments is not that critical,
you probably have a different requirement in JCR.
JCR is more restrictive in that respect, at least if we want to be compliant
with the specification. As soon as a node is created in the workspace it must
be searchable using a query. For most real life systems this is not a hard
requirement though. E.g. when a document is added to a repository, it usually
doesn't matter if it is retrievable by query only after a couple of seconds and
not right away.
Make Jackrabbit clusterable
---------------------------
Key: JCR-169
URL: http://issues.apache.org/jira/browse/JCR-169
Project: Jackrabbit
Issue Type: New Feature
Components: core
Reporter: Marcel Reutegger
Priority: Minor
This jira issue discusses the technical implications on the current design of
Jackrabbit to introduce clustering.
Particularly the following areas require thorough investigation:
- SharedItemStateManager and its cache
- cache integrity
- cache design: look aside, write through?
- hook for distributed cache, interface?
- isolation level
- transaction integrity within Jackrabbit, interaction with transient layer
- VirtualItemStateProvider
- same strategy as SharedItemStateManager?
- Search index
- single or per cluster node index?
- Observation
Please state more areas if needed.