[ 
https://issues.apache.org/jira/browse/JCR-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511401
 ] 

Marcel Reutegger commented on JCR-905:
--------------------------------------

This patch adds considerable overhead to the index process because for each 
added node the index has to first check if the node already exists. In lucene 
terms this means that lots of index readers and index writers are created and 
destroyed in a short period of time. The current code relies on the fact that 
the events passed to the query handler reflect a correct state change on the 
workspace. E.g. if an event says that a node is added, the index assumes that 
the node does not exist in the index.

I see two ways to fix this issue:

- The query handler does not automatically re-index the workspace, but rather 
re-plays the cluster-journal to get a valid index.
- The query handler needs to associate a journal revision with the current 
index state. When journal events are processed the query handler will ignore 
events from the 'past'.

I prefer option 2.

> Clustering: race condition may cause duplicate entries in search index
> ----------------------------------------------------------------------
>
>                 Key: JCR-905
>                 URL: https://issues.apache.org/jira/browse/JCR-905
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: clustering
>    Affects Versions: 1.3
>            Reporter: Martijn Hendriks
>         Attachments: JCR-905.patch, log1.txt, log2.txt
>
>
> There seems to be a race condition that may cause duplicate search index 
> entries. It is reproducible as follows (Jackrabbit 1.3):
> 1) Start clusternode 1 that just adds a single node of node type 
> clustering:test.
> 2) Shutdown clusternode 1.
> 3) Start clusternode 2 with an empty search index.
> 4) Execute the query  //element(*, clustering:test).
> 4) Print the result of the query (UUIDs of nodes in the result set).
> When I just run clusternode 2, then there is one node in the resultset, as 
> expected. However, when I debug clusternode 2 and have a breakpoint (i.e., a 
> pause of a few seconds at line 306 of RepositoryImpl.java - just before the 
> clusternode is started), then the resultset contains two results, both with 
> the same UUID.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to