[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855924#action_12855924
 ] 

Shai Erera commented on LUCENE-2386:
------------------------------------

I don't think that people need to write that "emptiness-detection-then-commit" 
code ... if they care, they can simply immediately call commit() after they 
open IW.

bq. Isn't opening IW with CREATE* mode called "specifically asking for"?

It depends on how you interpret the mode ... for example, you cannot pass 
OpenMode.APPEND for an empty Directory, because IW throws an exception. The 
modes are just meant to tell IW how to behave:
* APPEND - I know there is an index in the Directory, and I'd like to append to 
it.
* CREATE - I don't care if there is an index in the Directory -- create a new 
one, zeroing out all segments.
* CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one.

So if you pass CREATE on an already populated index, IW doesn't do the implicit 
commit, until you call commit() yourself. But if you pass CREATE on an empty 
index, IW suddenly calls commit()? That's just an inconsistency that's meant to 
allow you to open an IR immediately after "new IW()" call, irregardless of what 
was there? And if you open that IR, then if the index was populated you see the 
previous set of documents, but if it wasn't you see nothing, even though you 
meant to say "override what's there"?

I've checked what FileOutputStream does, using the following code:
{code}
File file = new File("d:/temp/tmpfile");
FileOutputStream fos = new FileOutputStream(file);
fos.write(3);
fos.close();
          
fos = new FileOutputStream(file);
FileInputStream fis = new FileInputStream(file);
System.out.println(fis.read());
{code}

* Second line creates an empty file immediately, not waiting for close() or 
flush() -- which resembles the behavior that you're suggesting we should take 
w/ IW (which is the 'today's behavior')
* Forth line closes the file, flushing and writing the content.
* Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros 
out the file content immediately, even before you wrote a single piece of byte 
to it.
* Sixth+Seventh line proves it by attempting to read from the file, and the 
output printed is -1.

I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm 
trying to show is that we don't fully adhere to the CREATE mode, and rightfully 
if you ask me - we shouldn't zero out the segments until the application called 
commit(). But we choose to adhere differently to the CREATE* mode if the index 
is already populated. That's an inconsistent behavior, at least in my 
perspective. It's also harder to explain and document, e.g. "you should call 
commit() if you used CREATE, in case you want to zero out everything 
immediately, and the Directory is not empty, but you don't need to call 
commit() if the directory was empty, Lucene will do it for you." -- so now how 
will the app know if it should call commit()? It will need to write a sort of 
"emptiness-detection-then-commit"?

I am willing to consider the following semantics:
* APPEND - assumes an index exists and open it.
* CREATE - zeros out everything that's in the directory *immediately*, and also 
prepares an empty directory.
* CREATE_OR_APPEND - either loads an existing index, or is able to work on the 
empty directory. No implicit commit is happening by IW if the index does not 
exist.

But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed 
change to the patch so far -- if you open an index in CREATE*, you should call 
commit before you can read it. That will adhere to the semantics of what the 
application wanted, whether it meant to zero out an existing Directory, or 
create a new one from scratch.

> IndexWriter commits unnecessarily on fresh Directory
> ----------------------------------------------------
>
>                 Key: LUCENE-2386
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2386
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
> LUCENE-2386.patch, LUCENE-2386.patch
>
>
> I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
> Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
> unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
> ... why do we need that commit? Do we really expect people to open an 
> IndexReader on an empty Directory which they just passed to an IW w/ 
> create=true? If they want, they can simply call commit() right away on the IW 
> they created.
> I ran into this when writing a test which committed N times, then compared 
> the number of commits (via IndexReader.listCommits) and was surprised to see 
> N+1 commits.
> Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
> jumping on me .. so the change might not be that simple. But I think it's 
> manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to