[ 
https://issues.apache.org/jira/browse/LUCENE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750284#action_12750284
 ] 

Uwe Schindler commented on LUCENE-1881:
---------------------------------------

There is no practical solution for this, indexing is a one-way action and not 
reversible. Because of this we offer "stored" fields as a store for the orginal 
or additional information to the indexed documents (e.g. for storing the 
original strings indexed).

Lucene works with an "inverted index" 
([http://en.wikipedia.org/wiki/Inverted_index]). During inversion of these 
non-stored fields (indexed ones), the fields are tokenized (which is a 
non-reversible action, because stop-words are removed, terms are normalized and 
so on) and these terms are stored in a global unique list off all terms. The 
index then only contains the references to the document ids (one-way from term 
-> document id). For your problem you need to get the list of terms for one 
document which is not easily possible (there is some possibility to iterate 
over all terms/docs and try to rebuild the terms for a document, but you never 
get back the old indexed contents and its very slow. Look into the tool "Luke" 
for this, which is a GUI for Lucene that has some code to do this).

You can only add your already indexed contents to another index using 
IndexWriter.addIndexes(). In this case they stay searchable but cannot be 
modified.

> Non-stored fields are not copied in writer.addDocument()?
> ---------------------------------------------------------
>
>                 Key: LUCENE-1881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1881
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 2.4.1
>         Environment: Linux
>            Reporter: Wai Wong
>            Assignee: Hoss Man
>            Priority: Critical
>
> We would like to modified stored documents properties.  The method is to use 
> IndexReader to open all files, modified some fields, and copy the document 
> via addDocument() of IndexWriter to another index.  But all fields that are 
> created using Field.Store.NO are no longer available for searching.
> Sample code in jsp is attached:
> <%@ page language="java" 
> import="org.apache.lucene.analysis.standard.StandardAnalyzer;"%>
> <%@ page language="java" import="org.apache.lucene.document.*;"%>
> <%@ page language="java" import="org.apache.lucene.index.*;"%>
> <%@ page language="java" import="org.apache.lucene.search.*;"%>
> <%@ page contentType="text/html; charset=utf8" %>
> <%
>     // create for testing
>     IndexWriter writer = new IndexWriter("/opt/wwwroot/351/Index/test", new 
> StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
>     Document doc = new Document();
>     doc.add(new Field("A", "1234", Field.Store.NO , 
> Field.Index.NOT_ANALYZED));
>     doc.add(new Field("B", "abcd", Field.Store.NO , 
> Field.Index.NOT_ANALYZED));
>     writer.addDocument(doc);
>     writer.close();
>     // check ok
>     Query q = new TermQuery(new Term("A", "1234"));
>     Searcher s = new IndexSearcher("/opt/wwwroot/351/Index/test");
>     Hits h = s.search(q);
>     out.println("# of document found is " + h.length());        // it is ok
>     // update the document to change or remove a field
>     IndexReader r = IndexReader.open("/opt/wwwroot/351/Index/test");
>     doc = r.document(0);
>     r.deleteDocument(0);
>     r.close();
>     doc.removeField("B");
>     writer = new IndexWriter("/opt/wwwroot/351/Index/test1", new 
> StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
>     writer.addDocument(doc);
>     writer.optimize();
>     writer.close();
>     // test again
>     s = new IndexSearcher("/opt/wwwroot/351/Index/test1");
>     h = s.search(q);
>     out.println("<P># of document found is now " + h.length());
>     r = IndexReader.open("/opt/wwwroot/351/Index/test1");
>     out.println("<P> max Doc is " + r.maxDoc());
> %>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to