Re: indexing big files

Winton Davies Tue, 08 Jan 2002 14:28:33 -0800

My guess is Garbage Collection -- Try allocating twice as much Heap as before.
or more. Try running with -gc:verbose (or whatever).


  Cheers,
  Winton


>Question from a Lucene newbie... I'm trying to index a file structure which
>happens to include a relatively large file (310kb with 55,700 words) and
>for some reason it appears to hanging the whole indexing process.  Here's a
>quick run-down..
>
>1) Am using a webcrawler to retrieve files and copy to my local disk.
>2) For files like .pdf's... I'm copying an .html equivalent of the file to
>my disk (but leaving .pdf extension).
>3) Then later in a serperate batch process I run pretty much the standard
>out of the box "org.apache.lucene.IndexHTML" demo class (except I've added
>.pdf as a possible indexing type).
>
>That's about it.  No big deal.  The transformation from pdf to html is not
>perfected yet either... so file size will definitely drop in the future...
>as nonsense terms are being included in these files.  But for now... what
>should I be looking at or altering to find out what is causing the hang?
>Thanks!
>
>Jon Wasson
>
>
>--
>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: indexing big files

Reply via email to