On Thu, 11 Nov 2004 18:39:58 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Thank you for the tip Craig. I am new to Lucene, user support groups and not > the most experienced programmer. To be honest I am starting to feel a little > over my head with this project. >
I know how you feel ... it can be overwhelming when you first enter the wide and wonderful world of open source ... :-) As it happens, each of the subprojects at Jakarta (including Lucene) has their own mailing lists for users and developers. This particular list is for the Jakarta Commons packages, which is a bunch of small reusable libraries. I've cc'd this message to the user list for Lucene -- it will need to be approved by a moderator before it's posted -- where you should be able to find people that know a lot more about Lucene than I do. You'll likely want to subscribe to the Lucene User list yourself in order to ask and answer direct questions about Lucene. To do so, send an empty message to <[EMAIL PROTECTED]>. Info about all the mailing lists at Jakarta can be found at: http://jakarta.apache.org/site/mail.html Good luck! Craig > My company has a content management product. Each time someone changes the > directory structure or a file with in it that portion of the site needs to > be re-indexed so the changes are reflected in future searches (indexing must > happen during run time). > > I have written a Indexer class with a static Index() method. The idea is too > call the method every time something changes and the index needs to be > re-examined. I am hoping the logic put in by Doug Cutting surrounding the > UID will make indexing efficient enough to be called so frequently. > > This class works great when I test it on my own little site (I have about > 2000 file). But when I drop the functionality into the QA environment I get > a locking error. > > I can't access the stack trace, all I can get at is a log file the > application writes too. Here is the section my class wrote. It was right in > the middle of indexing and bang lock issue. > > I don't know if the problem is in my code or something in the existing > application. > > Error Message: > ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent) > |INFO|INDEXING INFO: Start Indexing new content. > |INFO|INDEXING INFO: Index Folder Did Not Exist. Start Creation Of New Index > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING INFO: Beginnging Incremental update comparisions > |INFO|INDEXING ERROR: Unable to index new content Lock obtain timed out: > Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432 > 10f7fe8-write.lock > |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent) > > Here is my code. You will recognize it pretty much as the IndexHTML class > from the Lucene demo written by Doug Cutting. I have put a ton of comments > in a attempt to understand what is going on. > > Any help would be appreciated. > > Luke > > package com.fbhm.bolt.search; > > /* > * Created on Nov 11, 2004 > * > * This class will create a single index file for the Content > * Management System (CMS). It contains logic to ensure > * indexing is done "intelligently". Based on IndexHTML.java > * from the demo folder that ships with Lucene > */ > > import java.io.File; > import java.io.IOException; > import java.util.Arrays; > import java.util.Date; > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.Term; > import org.apache.lucene.index.TermEnum; > import org.pdfbox.searchengine.lucene.LucenePDFDocument; > import org.apache.lucene.demo.HTMLDocument; > > import com.alaia.common.debug.Trace; > import com.alaia.common.util.AppProperties; > > /** > * @author lshannon Description: <br> > * This class is used to index a content folder. It contains logic > to > * ensure only new or documents that have been modified since the > last > * search are indexed. <br> > * Based on code writen by Doug Cutting in the IndexHTML class found > in > * the Lucene demo > */ > public class Indexer { > //true during deletion pass, this is when the index already exists > private static boolean deleting = false; > > //object to read existing indexes > private static IndexReader reader; > > //object to write to the index folder > private static IndexWriter writer; > > //this will be used to write the index file > private static TermEnum uidIter; > > /* > * This static method does all the work, the end result is an up-to-date > index folder > */ > public static void Index() { > //we will assume to start the index has been created > boolean create = true; > //set the name of the index file > String indexFileLocation = > AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root"); > //set the name of the content folder > String contentFolderLocation = > AppProperties.getPropertyAsString("site.root"); > //manage whether the index needs to be created or not > File index = new File(indexFileLocation); > File root = new File(contentFolderLocation); > //the index file indicated exists, we need an incremental update of the > // index > if (index.exists()) { > Trace.TRACE("INDEXING INFO: An index folder exists at: " + > indexFileLocation); > deleting = true; > create = false; > try { > //this version of index docs is able to execute the incremental > // update > indexDocs(root, indexFileLocation, create); > } catch (Exception e) { > //we were unable to do the incremental update > Trace.TRACE("INDEXING ERROR: Unable to execute incremental update " > + e.getMessage()); > } > //after exiting this loop the index should be current with content > Trace.TRACE("INDEXING INFO: Incremental update completed."); > } > try { > //create the writer > writer = new IndexWriter(index, new StandardAnalyzer(), create); > //configure the writer > writer.mergeFactor = 10000; > writer.maxFieldLength = 100000; > try { > //get the start date > Date start = new Date(); > //call the indexDocs method, this time we will add new > // documents > Trace.TRACE("INDEXING INFO: Start Indexing new content."); > indexDocs(root, indexFileLocation, create); > Trace.TRACE("INDEXING INFO: Indexing new content complete."); > //optimize the index > writer.optimize(); > //close the writer > writer.close(); > //get the end date > Date end = new Date(); > long totalTime = end.getTime() - start.getTime(); > Trace.TRACE("INDEXING INFO: All Indexing Operations Completed in " > + totalTime + " milliseconds"); > } catch (Exception e1) { > //unable to add new documents > Trace.TRACE("INDEXING ERROR: Unable to index new content " > + e1.getMessage()); > } > } catch (IOException e) { > Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter " > + e.getMessage()); > } > } > > /* > * Walk directory hierarchy in uid order, while keeping uid iterator from > /* > * existing index in sync. Mismatches indicate one of: (a) old documents to > /* > * be deleted; (b) unchanged documents, to be left alone; or (c) new /* > * documents, to be indexed. > */ > > private static void indexDocs(File file, String index, boolean create) > throws Exception { > //the index already exists we do an incremental update > if (!create) { > Trace.TRACE("INDEXING INFO: Incremental Update Request Confirmed"); > //open existing index > reader = IndexReader.open(index); > //this gets an enummeration of uid terms > uidIter = reader.terms(new Term("uid", "")); > //jump to the index method that does the work > //this will use the Iteration above and does > //all the "smart" indexing > indexDocs(file); > //this will be true everytime the index already existed > //we are not going to delete documents that are old > if (deleting) { > Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Started. All > Deleted Docs will be listed."); > while (uidIter.term() != null > && uidIter.term().field() == "uid") { > //basically we are deleting all the document we have > // indexed before > Trace.TRACE("INDEXING INFO: Deleting document " > + HTMLDocument.uid2url(uidIter.term().text())); > //delete the term from the reader > reader.delete(uidIter.term()); > //go to the nextfield > uidIter.next(); > } > Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Completed"); > //turn off the deleting flag > deleting = false; > }//close the deleting branch > //close the enummeration > uidIter.close(); // close uid iterator > //close the reader > reader.close(); // close existing index > > } > //we go here is the index already existed > else { > Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist. Start Creation Of > New Index"); > // don't have exisiting > indexDocs(file); > } > } > > private static void indexDocs(File file) throws Exception { > //check if we are at the top of a directory > if (file.isDirectory()) { > //get a list of the files > String[] files = file.list(); > //sort them > Arrays.sort(files); > //index each file in the directory recursively > //we keep repeating this logic until we hit a > //file > for (int i = 0; i < files.length; i++) > //pass in the parent directory and the current file > //into the file constructor and index > indexDocs(new File(file, files[i])); > > } > //we have an actual file, so we need to consider the > //file extensions so the correct Document is created > else if (file.getPath().endsWith(".html") > || file.getPath().endsWith(".htm") > || file.getPath().endsWith(".txt") > || file.getPath().endsWith(".doc") > || file.getPath().endsWith(".xml") > || file.getPath().endsWith(".pdf")) { > > //if this is reached it means we were in the midst > //of an incremental update > if (uidIter != null) { > //get the uid for the document we are on > String uid = HTMLDocument.uid(file); > //now compare this document to the one we have in the > //enummeration of terms. > //if the term in the enummeration is less than the > //term we are on it must be deleted (if we are indeed > //doing an incrementatal update) > Trace.TRACE("INDEXING INFO: Beginnging Incremental update > comparisions"); > while (uidIter.term() != null > && uidIter.term().field() == "uid" > && uidIter.term().text().compareTo(uid) < 0) { > //delete stale docs > if (deleting) { > reader.delete(uidIter.term()); > } > uidIter.next(); > } > //if the terms are equal there is no change with this document > //we keep it as is > if (uidIter.term() != null && uidIter.term().field() == "uid" > && uidIter.term().text().compareTo(uid) == 0) { > uidIter.next(); > } > //if we are not deleting and the document was not there > //it means we didn't have this document on the last index > //and we should add it > else if (!deleting) { > if (file.getPath().endsWith(".pdf")) { > Document doc = LucenePDFDocument.getDocument(file); > Trace.TRACE("INDEXING INFO: Adding new document to the existing index: > " > + doc.get("url")); > writer.addDocument(doc); > } else if (file.getPath().endsWith(".xml")) { > Document doc = XMLDocument.Document(file); > Trace.TRACE("INDEXING INFO: Adding new document to the existing index: > " > + doc.get("url")); > writer.addDocument(doc); > } else { > Document doc = HTMLDocument.Document(file); > Trace.TRACE("INDEXING INFO: Adding new document to the existing index: > " > + doc.get("url")); > writer.addDocument(doc); > } > } > }//end the if for an incremental update > //we are creating a new index, add all document types > else { > if (file.getPath().endsWith(".pdf")) { > Document doc = LucenePDFDocument.getDocument(file); > Trace.TRACE("INDEXING INFO: Adding a new document to the new index: " > + doc.get("url")); > writer.addDocument(doc); > } else if (file.getPath().endsWith(".xml")) { > Document doc = XMLDocument.Document(file); > Trace.TRACE("INDEXING INFO: Adding a new document to the new index: " > + doc.get("url")); > writer.addDocument(doc); > } else { > Document doc = HTMLDocument.Document(file); > Trace.TRACE("INDEXING INFO: Adding a new document to the new index: " > + doc.get("url")); > writer.addDocument(doc); > }//close the else > }//close the else for a new index > }//close the else if to handle file types > }//close the indexDocs method > > } > > ----- Original Message ----- > From: "Craig McClanahan" <[EMAIL PROTECTED]> > To: "Jakarta Commons Users List" <[EMAIL PROTECTED]> > Sent: Thursday, November 11, 2004 6:13 PM > Subject: Re: avoiding locking > > > In order to get any useful help, it would be nice to know what you are > > trying to do, and (most importantly) what commons component is giving > > you the problem :-). The traditional approach is to put a prefix on > > your subject line -- for commons package "foo" it would be: > > > > [foo] avoiding locking > > > > It's also generally helpful to see the entire stack trace, not just > > the exception message itself. > > > > Craig > > > > > > On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon > > <[EMAIL PROTECTED]> wrote: > > > What can I do to avoid locking issues? > > > > > > Unable to execute incremental update Lock obtain timed out: > Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432 > 10f7fe8-write.lock > > > > > > Thanks, > > > > > > Luke > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
