Sorry, it is the 190,000th documents

maureen tanuwidjaja Sun, 28 Jan 2007 09:58:13 -0800

Hi...
  
  I'm sorry,I just found out and realize that it is NOT the 10,000th  documents 
that raise the exception when IndexWriter.add(Document) is  called....but it is 
the 180,000+ 10,000 document,so the 190,000th  documents.
  
  Now I am running the program again and put the code to print the stacktrace 
if exception happens.(thanks for the advice Erick)
  
  OK.Basically what I am going to index is XML documents that consist of  22 
folders where each folder contains  30,000  XML  Documents.Hence Total is 
660,000 XML Documents...I was reading the  Lucene book and spot about the 
mergeFactor.I would like to know wheter  the mergeFactor plays important part 
in indexing these files... and  perhaps that this one that has a strong 
correlation regarding  exception? I run my program using the default Value of  
mergeFactor,which is 10
  
  In case needed The PC used has the following spec:       Intel Pentium 4, 
2.40 GHz CPU, 512  MB of RAM
  
  
  Is there any suggestion about the mergeFactor,maxMergeFactor value that I 
should use for my case?
  
  
  
  Thanks and Regards,
  Maureen
  
  
Erick Erickson <[EMAIL PROTECTED]> wrote:  Maureen:


I lost the e-mail where you re-throw the exception. But you'd get a *lot*
more information if you'd print the stacktrace via
(catch Exception e) {
e.printStackTrace();
throw e;
}

And that would allow the folks who understand Lucene to give you a LOT more
help ...

Best
Erick

On 1/27/07, Chris Hostetter  wrote:
>
>
> did you try triggering a thread dump to see what it was doing at that
> point?
>
> depending on your merge factors and other IndexWriter settings it could
> just be doing a relaly big merge.
>
> : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST)
> : From: maureen tanuwidjaja 
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: My program stops indexing after 10000th documents is indexed
> :
> : Hi all,
> :
> :   Is there any limitation of number of file that lucene can handle?
> :   I indexed a total of 30000 XML Documents,however it stops at 10000th
> documents.
> :   No warning,no error ,no exception as well.
> :   ....
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491876.xml
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491886.xml
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491887.xml
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491891.xml
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491893.xml
> :   Indexing C:\sweetpea\wikipedia_xmlfiles\part-180000\491896.xml
> <--10000th doc
> :   --it idles here--
> :
> :  At first I thought that it was the size of  10000th document is so big
> so that it took quite a long time to put  into the index.Then i found
> out  that the 10000th document has the  size of 6 KB only.Indexing process
> stops for about 1 hour,so that i  decide to terminate the progress.
> :
> :   Is there anything to do with smt like setCompoundFiles etc?cause I
> dont include any in my program...
> :
> :   Any suggestion pls?
> :
> :   THanks and Best Regards,
> :
> :   Maureen
> :
> :
> : ---------------------------------
> : Any questions?  Get answers on any topic at Yahoo! Answers. Try it now.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


 
---------------------------------
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.

Sorry, it is the 190,000th documents

Reply via email to