Re: [jira] Commented: (LUCENE-710) Implement

Robert Engels Wed, 24 Jan 2007 08:11:10 -0800

I checked, and I don't see that disclaimer in the latest POSIX documentation.


-----Original Message-----
>From: Robert Engels <[EMAIL PROTECTED]>
>Sent: Jan 24, 2007 9:04 AM
>To: [email protected]
>Subject: Re: [jira] Commented: (LUCENE-710) Implement
>
>Curious, I guess I don't understand the BSD disclaimer. The application should 
>not need to track any of this. The OS should be tracking open FD and locks for 
>the process, and when it closes a FD on behalf of a process it should also 
>remove the locks.
>
>-----Original Message-----
>>From: "Marvin Humphrey (JIRA)" <[EMAIL PROTECTED]>
>>Sent: Jan 23, 2007 10:56 PM
>>To: [email protected]
>>Subject: [jira] Commented: (LUCENE-710) Implement "point in time" searching 
>>without relying on filesystem semantics
>>
>>
>>    [ 
>> https://issues.apache.org/jira/browse/LUCENE-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466911
>>  ] 
>>
>>Marvin Humphrey commented on LUCENE-710:
>>----------------------------------------
>>
>>On Jan 23, 2007, at 2:19 PM, Michael McCandless (JIRA) wrote:
>>
>>> First do no harm.
>>
>>If that was really your guiding philosophy, you would never change anything.
>>
>>> And Sun's Javadocs on the equivalent Java method, File.createNewFile, has a
>>> warning about not relying on this for locking:
>>> 
>>>   http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile()
>>
>>That page recommends that you use FileLock instead, which maps to Fcntl on
>>some systems.  The FreeBSD manpage on Fcntl uses less delicate language than
>>Sun in pointing out the drawbacks:
>>
>>     This interface follows the completely stupid semantics of System V and
>>     IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated
>>     with a file for a given process are removed when any file descriptor for
>>     that file is closed by that process.  This semantic means that applica-
>>     tions must be aware of any files that a subroutine library may access.
>>
>>Trying to guarantee that kind of discipline from library code severely limits
>>your options.
>>
>>> This warning is why we created the NativeFSLockFactory for Directory locking
>>> in the first place.
>>
>>Take a look at this bug, which explains how that warning got added.
>>
>>http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4676183
>>
>>Read the comment below -- the problem with the "protocol" they warn you
>>against using is with deleteOnExit(), not createNewFile().  I think you're
>>better off with dot-locks.
>>
>>> OK.  You could implement this in Lucene as a custom deletion policy once we
>>> get this commmitted (I think this is 6 proposals now for "deletion policy"
>>> for NFS), plus a wrapper around IndexReader.
>>
>>This was the response I got on the KinoSearch list:
>>
>>    We do not enable NFS writes, only reads (which is why Slashdot is able to
>>    reliably use NFS for its heavy load :-).  So I don't think that will work,
>>    if I understand you correctly.
>>
>>Lack of bulletproof support for NFS ain't gonna hold up my next release any
>>longer.  What a freakin' nightmare...
>>
>>> Implement "point in time" searching without relying on filesystem semantics
>>> ---------------------------------------------------------------------------
>>>
>>>                 Key: LUCENE-710
>>>                 URL: https://issues.apache.org/jira/browse/LUCENE-710
>>>             Project: Lucene - Java
>>>          Issue Type: Improvement
>>>          Components: Index
>>>    Affects Versions: 2.1
>>>            Reporter: Michael McCandless
>>>         Assigned To: Michael McCandless
>>>            Priority: Minor
>>>
>>> This was touched on in recent discussion on dev list:
>>>   http://www.gossamer-threads.com/lists/lucene/java-dev/41700#41700
>>> and then more recently on the user list:
>>>   http://www.gossamer-threads.com/lists/lucene/java-user/42088
>>> Lucene's "point in time" searching currently relies on how the
>>> underlying storage handles deletion files that are held open for
>>> reading.
>>> This is highly variable across filesystems.  For example, UNIX-like
>>> filesystems usually do "close on last delete", and Windows filesystem
>>> typically refuses to delete a file open for reading (so Lucene retries
>>> later).  But NFS just removes the file out from under the reader, and
>>> for that reason "point in time" searching doesn't work on NFS
>>> (see LUCENE-673 ).
>>> With the lockless commits changes (LUCENE-701 ), it's quite simple to
>>> re-implement "point in time searching" so as to not rely on filesystem
>>> semantics: we can just keep more than the last segments_N file (as
>>> well as all files they reference).
>>> This is also in keeping with the design goal of "rely on as little as
>>> possible from the filesystem".  EG with lockless we no longer re-use
>>> filenames (don't rely on filesystem cache being coherent) and we no
>>> longer use file renaming (because on Windows it can fails).  This
>>> would be another step of not relying on semantics of "deleting open
>>> files".  The less we require from filesystem the more portable Lucene
>>> will be!
>>> Where it gets interesting is what "policy" we would then use for
>>> removing segments_N files.  The policy now is "remove all but the last
>>> one".  I think we would keep this policy as the default.  Then you
>>> could imagine other policies:
>>>   * Keep past N day's worth
>>>   * Keep the last N
>>>   * Keep only those in active use by a reader somewhere (note: tricky
>>>     how to reliably figure this out when readers have crashed, etc.)
>>>   * Keep those "marked" as rollback points by some transaction, or
>>>     marked explicitly as a "snaphshot".
>>>   * Or, roll your own: the "policy" would be an interface or abstract
>>>     class and you could make your own implementation.
>>> I think for this issue we could just create the framework
>>> (interface/abstract class for "policy" and invoke it from
>>> IndexFileDeleter) and then implement the current policy (delete all
>>> but most recent segments_N) as the default policy.
>>> In separate issue(s) we could then create the above more interesting
>>> policies.
>>> I think there are some important advantages to doing this:
>>>   * "Point in time" searching would work on NFS (it doesn't now
>>>     because NFS doesn't do "delete on last close"; see LUCENE-673 )
>>>     and any other Directory implementations that don't work
>>>     currently.
>>>   * Transactional semantics become a possibility: you can set a
>>>     snapshot, do a bunch of stuff to your index, and then rollback to
>>>     the snapshot at a later time.
>>>   * If a reader crashes or machine gets rebooted, etc, it could choose
>>>     to re-open the snapshot it had previously been using, whereas now
>>>     the reader must always switch to the last commit point.
>>>   * Searchers could search the same snapshot for follow-on actions.
>>>     Meaning, user does search, then next page, drill down (Solr),
>>>     drill up, etc.  These are each separate trips to the server and if
>>>     searcher has been re-opened, user can get inconsistent results (=
>>>     lost trust).  But with, one series of search interactions could
>>>     explicitly stay on the snapshot it had started with.
>>
>>-- 
>>This message is automatically generated by JIRA.
>>-
>>You can reply to this email to add a comment to the issue online.
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-710) Implement

Reply via email to