Aaron McCurry created BLUR-61:
---------------------------------

             Summary: Remove sessions from the 0.2 code
                 Key: BLUR-61
                 URL: https://issues.apache.org/jira/browse/BLUR-61
             Project: Apache Blur
          Issue Type: Bug
            Reporter: Aaron McCurry


There was a discussion on the mail list about the maintaining of sessions in 
the 0.2 code.

http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201302.mbox/%3ccag_bhoy3_vdtv1jmfbscu-7mob4i9pm6dlof5di6ousgmpj...@mail.gmail.com%3E

I would like to remove the need for sessions from the code.  I prepose that we 
accomplish this by including the segment in the documentation location 
throughout the API.

Background, this is really an issue with Lucene and how it deals with mutations 
on the index.  Let me provide an example:

1. Document A gets added to the index and let's say that it gets added into the 
Lucene segment of "aa" which through a bit of math it becomes document id 
3570586 in the overall index but it actually document id 304 in the "aa" 
segment.  

2. Search gets executed, an index snapshot is created and Document A was 
reported in the search results as a hit at 3570586.

3. Now say that the document id reported to another system, and later that 
system actually wants to fetch the data for the hit.

4. Now a merge occurs and the "aa" is now merged with another segment (one or 
more).

5. Then the other system wants to fetch the document 3570586.  A new snapshot 
of the index was created and then document id 3570586 was requested.  But it's 
very likely (only blind luck will it be the right document) that it's going to 
fetch the wrong document.

Currently in the blur 0.2 code we get around this problem by storing the index 
snapshot in a session on each server.  So during a session the index cannot 
change.

Back to my preposed change of adding the segment to the document location.  The 
new document location will include [ shard index / segment name / document id 
in the segment (not the overall index document id) ].  On the server side keep 
old segments around for a certain amount of time after their last access, 
basically a LRU cache.  That way if a segment is deleted and another system 
still asks for data from an old segment, the data can still be retrieved.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to