Hi Hetan
Th's the major Problem of non Standatrdized Tags for HTML Document's u are Indexing ,resulting in lag time taken for Indexing process.... If u can Tweak the HTMLParser.jj file within lucene.zip '/demo/html' file [U have to have some Knowledge of JAVACC for this]. Karthik -----Original Message----- From: Hetan Shah [mailto:[EMAIL PROTECTED] Sent: Thursday, August 26, 2004 3:01 AM To: Lucene Users List Subject: Time to index documents Hello all, Is there a way to reduce the indexing time taken when the indexer is indexing about 30,000 + files. It is roughly taking around 6-7 hours to do this. I am using IndexHTML class to create the index out of HTML files. Another issue that I see is every once in a while I get the following output on the screen. adding ../31/1104852.html Parse Aborted: Encountered "\"" at line 7, column 1. Was expecting one of: <ArgName> ... "=" ... <TagEnd> ... Any suggestions on preventing this from happening? Thanks in advance. -H --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
