I knew that I had forgotten something. Below is the line that I use to create the field that I am trying to use to delete the entries with. I hope this avoids some confusion. Thank you very much to anyone that takes the time to read these messages.
doc.add(new StringField("FileName",filename, Field.Store.YES)); On Sat, Dec 14, 2013 at 1:28 AM, Jason Corekin <jason.core...@gmail.com>wrote: > Let me start by stating that I almost certain that I am doing something > wrong, and that I hope that I am because if not there is a VERY large bug > in Lucene. What I am trying to do is use the method > > > deleteDocuments(Term... terms) > > > out of the IndexWriter class to delete several Term object Arrays, each > fed to it via a separate Thread. Each array has around 460k+ Term object > in it. The issue is that after running for around 30 minutes or more the > method finishes, I then have a commit run and nothing changes with my files. > To be fair, I am running a custom Directory implementation that might be > causing problems, but I do not think that this is the case as I do not even > see any of the my Directory methods in the stack trace. In fact when I > set break points inside the delete methods of my Directory implementation > they never even get hit. To be clear replacing the custom Directory > implementation with a standard one is not an option due to the nature of > the data which is made up of terabytes of small (1k and less) files. So, > if the issue is in the Directory implementation I have to figure out how to > fix it. > > > Below are the pieces of code that I think are relevant to this issue as > well as a copy of the stack trace thread that was doing work when I paused > the debug session. As you are likely to notice, the thread is called a > DBCloner because it is being used to clone the underlying Index based > database (needed to avoid storing trillions of files directly on disk). The > idea is to duplicate the selected group of terms into a new database and > then delete to original terms from the original database. The duplicate > work wonderfully, but not matter what I do including cutting the program > down to one thread I cannot shrink the database and the time to try to do > the deletes takes drastically too long. > > > In an attempt to be as helpful as possible, I will say this. I have been > tracing this problem for a few days and have seen that > > BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef) > > is where that majority of the execution time is spent. I have also > noticed that this method return false MUCH more often than it returns true. > I have been trying to figure out how the mechanics of this process work > just in case the issue was not in my code and I might have been able to > find the problem. But I have yet to find the problem either in Lucene > 4.5.1 or Lucene 4.6. If anyone has any ideas as to what I might be doing > wrong, I would really appreciate reading what you have to say. Thanks in > advance. > > > > Jason > > > > private void cloneDB() throws QueryNodeException { > > > > Document doc; > > ArrayList<String> fileNames; > > int start = docRanges[(threadNumber * 2)]; > > int stop = docRanges[(threadNumber * 2) + > 1]; > > > > try { > > > > fileNames = new > ArrayList<String>(docsPerThread); > > for (int i = start; i < > stop; i++) { > > doc = > searcher.doc(i); > > try { > > > adder.addDoc(doc); > > > fileNames.add(doc.get("FileName")); > > } catch > (TransactionExceptionRE | TransactionException | LockConflictException te) { > > > adder.txnAbort(); > > > System.err.println(Thread.currentThread().getName() + ": Adding a message > failed, retrying."); > > } > > } > > > deleters[threadNumber].deleteTerms("FileName", > fileNames); > > > deleters[threadNumber].commit(); > > > > } catch (IOException | ParseException ex) > { > > > Logger.getLogger(DocReader.class.getName()).log(Level.SEVERE, > null, ex); > > } > > } > > > > > > public void deleteTerms(String > dbField,ArrayList<String> fieldTexts) throws IOException { > > Term[] terms = new > Term[fieldTexts.size()]; > > for(int i=0;i<fieldTexts.size();i++){ > > terms[i]= new > Term(dbField,fieldTexts.get(i)); > > } > > writer.deleteDocuments(terms); > > } > > > > public void deleteDocuments(Term... terms) throws > IOException > > > > > > Thread [DB Cloner 2] (Suspended) > > owns: BufferedUpdatesStream (id=54) > > owns: IndexWriter (id=49) > > FST<T>.readFirstRealTargetArc(long, Arc<T>, BytesReader) > line: 979 > > FST<T>.findTargetArc(int, Arc<T>, Arc<T>, BytesReader) > line: 1220 > > > BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef) > line: 1679 > > BufferedUpdatesStream.applyTermDeletes(Iterable<Term>, > ReadersAndUpdates, SegmentReader) line: 414 > > BufferedUpdatesStream.applyDeletesAndUpdates(ReaderPool, > List<SegmentCommitInfo>) line: 283 > > IndexWriter.applyAllDeletesAndUpdates() line: 3112 > > IndexWriter.applyDeletesAndPurge(boolean) line: 4641 > > > DocumentsWriter$ApplyDeletesEvent.process(IndexWriter, > boolean, boolean) line: 673 > > IndexWriter.processEvents(Queue<Event>, boolean, boolean) > line: 4665 > > IndexWriter.processEvents(boolean, boolean) line: 4657 > > > IndexWriter.deleteDocuments(Term...) line: 1421 > > DocDeleter.deleteTerms(String, ArrayList<String>) line: 95 > > > DBCloner.cloneDB() line: 233 > > DBCloner.run() line: 133 > > Thread.run() line: 744 > > > > > > >