Aaron McCurry created BLUR-61:
---------------------------------
Summary: Remove sessions from the 0.2 code
Key: BLUR-61
URL: https://issues.apache.org/jira/browse/BLUR-61
Project: Apache Blur
Issue Type: Bug
Reporter: Aaron McCurry
There was a discussion on the mail list about the maintaining of sessions in
the 0.2 code.
http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201302.mbox/%3ccag_bhoy3_vdtv1jmfbscu-7mob4i9pm6dlof5di6ousgmpj...@mail.gmail.com%3E
I would like to remove the need for sessions from the code. I prepose that we
accomplish this by including the segment in the documentation location
throughout the API.
Background, this is really an issue with Lucene and how it deals with mutations
on the index. Let me provide an example:
1. Document A gets added to the index and let's say that it gets added into the
Lucene segment of "aa" which through a bit of math it becomes document id
3570586 in the overall index but it actually document id 304 in the "aa"
segment.
2. Search gets executed, an index snapshot is created and Document A was
reported in the search results as a hit at 3570586.
3. Now say that the document id reported to another system, and later that
system actually wants to fetch the data for the hit.
4. Now a merge occurs and the "aa" is now merged with another segment (one or
more).
5. Then the other system wants to fetch the document 3570586. A new snapshot
of the index was created and then document id 3570586 was requested. But it's
very likely (only blind luck will it be the right document) that it's going to
fetch the wrong document.
Currently in the blur 0.2 code we get around this problem by storing the index
snapshot in a session on each server. So during a session the index cannot
change.
Back to my preposed change of adding the segment to the document location. The
new document location will include [ shard index / segment name / document id
in the segment (not the overall index document id) ]. On the server side keep
old segments around for a certain amount of time after their last access,
basically a LRU cache. That way if a segment is deleted and another system
still asks for data from an old segment, the data can still be retrieved.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira