[
https://issues.apache.org/jira/browse/LUCENE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673610#action_12673610
]
Karl Wettin commented on LUCENE-1537:
-------------------------------------
I didn't try it out yet, but I have a few comments and questions on the patch:
{code}
Index:
contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexReader.java
+
+ public Object clone() {
+ try {
+ doCommit();
+ InstantiatedIndex clonedIndex = index.cloneWithDeletesNorms();
+ return new InstantiatedIndexReader(clonedIndex);
+ } catch (IOException ioe) {
+ throw new RuntimeException("", ioe);
+ }
+ }
Index:
contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndex.java
+
+ InstantiatedIndex cloneWithDeletesNorms() {
+ InstantiatedIndex clone = new InstantiatedIndex();
+ clone.version = System.currentTimeMillis();
+ clone.documentsByNumber = documentsByNumber;
+ clone.deletedDocuments = new HashSet<Integer>(deletedDocuments);
+ clone.termsByFieldAndText = termsByFieldAndText;
+ clone.orderedTerms = orderedTerms;
+ clone.normsByFieldNameAndDocumentNumber = new HashMap<String,
byte[]>(normsByFieldNameAndDocumentNumber);
+ clone.fieldSettings = fieldSettings;
+ return clone;
+ }
{code}
Perhaps we should move deleted documents to the reader? It might be a bit of
work to hook it up with term enum et c, but it could be worth looking in to. I
think it makes more sense to keep the same instance of InstantiatedIndex and
only produce a cloned InstantiatedIndexReader. It is the reader#clone we call
upon so cloning the store sounds like a future placeholder for unwanted bugs.
I see there are some left overs from your attempt to handle none optimized
readers:
{code}
- documentsByNumber = new InstantiatedDocument[sourceIndexReader.numDocs()];
+ documentsByNumber = new InstantiatedDocument[sourceIndexReader.maxDoc()];
// create documents
for (int i = 0; i < sourceIndexReader.numDocs(); i++) {
{code}
I think if you switch to maxDoc it should also use maxDoc int the loop and skip
any deleted document.
{code}
- for (InstantiatedDocument document : getDocumentsByNumber()) {
+ //for (InstantiatedDocument document : getDocumentsByNumber()) {
+ for (InstantiatedDocument document : getDocumentsNotDeleted()) {
for (Field field : (List<Field>) document.getDocument().getFields()) {
if (field.isTermVectorStored() && field.isStoreOffsetWithTermVector())
{
TermPositionVector termPositionVector = (TermPositionVector)
sourceIndexReader.getTermFreqVector(document.getDocumentNumber(), field.name());
@@ -312,7 +325,15 @@
public InstantiatedDocument[] getDocumentsByNumber() {
return documentsByNumber;
}
-
+
+ public List<InstantiatedDocument> getDocumentsNotDeleted() {
+ List<InstantiatedDocument> list = new
ArrayList<InstantiatedDocument>(documentsByNumber.length-deletedDocuments.size());
+ for (int x=0; x < documentsByNumber.length; x++) {
+ if (!deletedDocuments.contains(x)) list.add(documentsByNumber[x]);
+ }
+ return list;
+ }
+
{code}
As the source never contains any deleted documents this really doesn't do
anything but consume a bit of resources, or?
{code}
- int maxVal =
getAssociatedDocuments()[max].getDocument().getDocumentNumber();
+ InstantiatedTermDocumentInformation itdi = getAssociatedDocuments()[max];
+ InstantiatedDocument id = itdi.getDocument();
+ int maxVal = id.getDocumentNumber();
+ //int maxVal =
getAssociatedDocuments()[max].getDocument().getDocumentNumber();
{code}
Is this refactor just for debugging purposes? I find it harder to read than the
original one-liner.
> InstantiatedIndexReader.clone
> -----------------------------
>
> Key: LUCENE-1537
> URL: https://issues.apache.org/jira/browse/LUCENE-1537
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/*
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Assignee: Karl Wettin
> Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1537.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> This patch will implement IndexReader.clone for InstantiatedIndexReader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]