Question from a Lucene newbie... I'm trying to index a file structure which happens to include a relatively large file (310kb with 55,700 words) and for some reason it appears to hanging the whole indexing process. Here's a quick run-down..
1) Am using a webcrawler to retrieve files and copy to my local disk. 2) For files like .pdf's... I'm copying an .html equivalent of the file to my disk (but leaving .pdf extension). 3) Then later in a serperate batch process I run pretty much the standard out of the box "org.apache.lucene.IndexHTML" demo class (except I've added .pdf as a possible indexing type). That's about it. No big deal. The transformation from pdf to html is not perfected yet either... so file size will definitely drop in the future... as nonsense terms are being included in these files. But for now... what should I be looking at or altering to find out what is causing the hang? Thanks! Jon Wasson -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
