i am using lucene to index xml+html files. the xml contains the metadata associated 
with the html file.

the process, at a high level, is: 
-create a list of all xml files in a folder
-parse through each of the xml file using SAX parser
-create name:value pairs out of the tags and values, and index them
-one of the tag contains the url to the html page
-when you encounter that, parse the html file

when i do this for a few files, it seems to work fine. however, as the number of files 
increase, it starts to throw an error!
initially, i get a "SAXException: Content is not allowed in trailing section." - but i 
checked and the xml file seems to be well-formed! i even tried indexing this file 
individually, and it worked!
then i get "Index locked for write: Lock@/export/home.../write.lock"
at times, i also get a "Timed out waiting for: Lock@/export/home/.../commit.lock"

as a result of this, the index doesnt get updates and the results are incorrect. i 
also observed once that while the index is being built, i get the results, but when it 
exits, i stop getting results. possibly, my hunch is that index updation didnot get 
commited?

what is particularly intersting to note is that this problem occurs at only some 
times. another observation is that it worked fine for around 50 files, but not for 
about 100 files?

can anyone help me - or give pointers as to what is going on here?

-rishabh


 


____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to