[jira] Commented: (LUCENE-1537) InstantiatedIndexReader.clone

Karl Wettin (JIRA) Sun, 15 Feb 2009 04:59:26 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673610#action_12673610
 ]


Karl Wettin commented on LUCENE-1537:
-------------------------------------

I didn't try it out yet, but I have a few comments and questions on the patch:

{code}
Index: 
contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexReader.java
+  
+  public Object clone() {
+    try {
+      doCommit();
+      InstantiatedIndex clonedIndex = index.cloneWithDeletesNorms();
+      return new InstantiatedIndexReader(clonedIndex);
+    } catch (IOException ioe) {
+      throw new RuntimeException("", ioe);
+    }
+  }

Index: 
contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndex.java
+
+  InstantiatedIndex cloneWithDeletesNorms() {
+    InstantiatedIndex clone = new InstantiatedIndex();
+    clone.version = System.currentTimeMillis();
+    clone.documentsByNumber = documentsByNumber;
+    clone.deletedDocuments = new HashSet<Integer>(deletedDocuments);
+    clone.termsByFieldAndText = termsByFieldAndText;
+    clone.orderedTerms = orderedTerms;
+    clone.normsByFieldNameAndDocumentNumber = new HashMap<String, 
byte[]>(normsByFieldNameAndDocumentNumber);
+    clone.fieldSettings = fieldSettings;
+    return clone;
+  }
{code}

Perhaps we should move deleted documents to the reader? It might be a bit of 
work to hook it up with term enum et c, but it could be worth looking in to. I 
think it makes more sense to keep the same instance of InstantiatedIndex and 
only produce a cloned InstantiatedIndexReader. It is the reader#clone we call 
upon so cloning the store sounds like a future placeholder for unwanted bugs.



I see there are some left overs from your attempt to handle none  optimized 
readers:

{code}
-    documentsByNumber = new InstantiatedDocument[sourceIndexReader.numDocs()];
+    documentsByNumber = new InstantiatedDocument[sourceIndexReader.maxDoc()];
 
     // create documents
     for (int i = 0; i < sourceIndexReader.numDocs(); i++) {
{code}

I think if you switch to maxDoc it should also use maxDoc int the loop and skip 
any deleted document. 



{code}
-    for (InstantiatedDocument document : getDocumentsByNumber()) {
+    //for (InstantiatedDocument document : getDocumentsByNumber()) {
+    for (InstantiatedDocument document : getDocumentsNotDeleted()) {
       for (Field field : (List<Field>) document.getDocument().getFields()) {
         if (field.isTermVectorStored() && field.isStoreOffsetWithTermVector()) 
{
           TermPositionVector termPositionVector = (TermPositionVector) 
sourceIndexReader.getTermFreqVector(document.getDocumentNumber(), field.name());
@@ -312,7 +325,15 @@
   public InstantiatedDocument[] getDocumentsByNumber() {
     return documentsByNumber;
   }
-
+  
+  public List<InstantiatedDocument> getDocumentsNotDeleted() {
+    List<InstantiatedDocument> list = new 
ArrayList<InstantiatedDocument>(documentsByNumber.length-deletedDocuments.size());
+    for (int x=0; x < documentsByNumber.length; x++) {
+      if (!deletedDocuments.contains(x)) list.add(documentsByNumber[x]);
+    }
+    return list;
+  } 
+  
{code}

As the source never contains any deleted documents this really doesn't do 
anything but consume a bit of resources, or?



{code}
-    int maxVal = 
getAssociatedDocuments()[max].getDocument().getDocumentNumber();
+    InstantiatedTermDocumentInformation itdi = getAssociatedDocuments()[max];
+    InstantiatedDocument id = itdi.getDocument();
+    int maxVal = id.getDocumentNumber();
+    //int maxVal = 
getAssociatedDocuments()[max].getDocument().getDocumentNumber();
{code}

Is this refactor just for debugging purposes? I find it harder to read than the 
original one-liner.

> InstantiatedIndexReader.clone
> -----------------------------
>
>                 Key: LUCENE-1537
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1537
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>            Priority: Trivial
>             Fix For: 2.9
>
>         Attachments: LUCENE-1537.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> This patch will implement IndexReader.clone for InstantiatedIndexReader.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1537) InstantiatedIndexReader.clone

Reply via email to