Let me start by stating that I almost certain that I am doing something wrong, and that I hope that I am because if not there is a VERY large bug in Lucene. What I am trying to do is use the method
deleteDocuments(Term... terms) out of the IndexWriter class to delete several Term object Arrays, each fed to it via a separate Thread. Each array has around 460k+ Term object in it. The issue is that after running for around 30 minutes or more the method finishes, I then have a commit run and nothing changes with my files. To be fair, I am running a custom Directory implementation that might be causing problems, but I do not think that this is the case as I do not even see any of the my Directory methods in the stack trace. In fact when I set break points inside the delete methods of my Directory implementation they never even get hit. To be clear replacing the custom Directory implementation with a standard one is not an option due to the nature of the data which is made up of terabytes of small (1k and less) files. So, if the issue is in the Directory implementation I have to figure out how to fix it. Below are the pieces of code that I think are relevant to this issue as well as a copy of the stack trace thread that was doing work when I paused the debug session. As you are likely to notice, the thread is called a DBCloner because it is being used to clone the underlying Index based database (needed to avoid storing trillions of files directly on disk). The idea is to duplicate the selected group of terms into a new database and then delete to original terms from the original database. The duplicate work wonderfully, but not matter what I do including cutting the program down to one thread I cannot shrink the database and the time to try to do the deletes takes drastically too long. In an attempt to be as helpful as possible, I will say this. I have been tracing this problem for a few days and have seen that BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef) is where that majority of the execution time is spent. I have also noticed that this method return false MUCH more often than it returns true. I have been trying to figure out how the mechanics of this process work just in case the issue was not in my code and I might have been able to find the problem. But I have yet to find the problem either in Lucene 4.5.1 or Lucene 4.6. If anyone has any ideas as to what I might be doing wrong, I would really appreciate reading what you have to say. Thanks in advance. Jason private void cloneDB() throws QueryNodeException { Document doc; ArrayList<String> fileNames; int start = docRanges[(threadNumber * 2)]; int stop = docRanges[(threadNumber * 2) + 1]; try { fileNames = new ArrayList<String>(docsPerThread); for (int i = start; i < stop; i++) { doc = searcher.doc(i); try { adder.addDoc(doc); fileNames.add(doc.get("FileName")); } catch (TransactionExceptionRE | TransactionException | LockConflictException te) { adder.txnAbort(); System.err.println(Thread.currentThread().getName() + ": Adding a message failed, retrying."); } } deleters[threadNumber].deleteTerms("FileName", fileNames); deleters[threadNumber].commit(); } catch (IOException | ParseException ex) { Logger.getLogger(DocReader.class.getName()).log(Level.SEVERE, null, ex); } } public void deleteTerms(String dbField,ArrayList<String> fieldTexts) throws IOException { Term[] terms = new Term[fieldTexts.size()]; for(int i=0;i<fieldTexts.size();i++){ terms[i]= new Term(dbField,fieldTexts.get(i)); } writer.deleteDocuments(terms); } public void deleteDocuments(Term... terms) throws IOException Thread [DB Cloner 2] (Suspended) owns: BufferedUpdatesStream (id=54) owns: IndexWriter (id=49) FST<T>.readFirstRealTargetArc(long, Arc<T>, BytesReader) line: 979 FST<T>.findTargetArc(int, Arc<T>, Arc<T>, BytesReader) line: 1220 BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef) line: 1679 BufferedUpdatesStream.applyTermDeletes(Iterable<Term>, ReadersAndUpdates, SegmentReader) line: 414 BufferedUpdatesStream.applyDeletesAndUpdates(ReaderPool, List<SegmentCommitInfo>) line: 283 IndexWriter.applyAllDeletesAndUpdates() line: 3112 IndexWriter.applyDeletesAndPurge(boolean) line: 4641 DocumentsWriter$ApplyDeletesEvent.process(IndexWriter, boolean, boolean) line: 673 IndexWriter.processEvents(Queue<Event>, boolean, boolean) line: 4665 IndexWriter.processEvents(boolean, boolean) line: 4657 IndexWriter.deleteDocuments(Term...) line: 1421 DocDeleter.deleteTerms(String, ArrayList<String>) line: 95 DBCloner.cloneDB() line: 233 DBCloner.run() line: 133 Thread.run() line: 744