Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Dear All How much actually the disk space needed to optimize the index?The explanation given in documentation seems to be very different with the practical situation I have an index file of size 18.6 G and I am going to optimize it.I keep this index in mobile Hard Disk with

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
to 3minutes in searching inside this unoptimized index How bout the memory consumption?will it took greater amount of memory consumption if using the optimized one? Thanks a lot Regards, Maureen Michael McCandless [EMAIL PROTECTED] wrote: maureen tanuwidjaja wrote

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
have the searching result in 30 to 3 minutes, which is actually quite unacceptable for the search engine I build...Is there any recommendation on how faster searching could be done? Thanks, Maureen Michael McCandless [EMAIL PROTECTED] wrote: maureen tanuwidjaja wrote: One

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Oops sorry,mistyping.. I have the searching result in 30 SECONDS to 3 minutes, which is actually quite unacceptable for the search engine I build...Is there any recommendation on how faster searching could be done? maureen tanuwidjaja [EMAIL PROTECTED] wrote: Hi mike The only

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Hi Mike, How to disable/turn off the norm?is it while indexing? Thanks, Maureen - Need Mail bonding? Go to the Yahoo! Mail QA for great tips from Yahoo! Answers users.

How to disable lucene norm factor?

2007-03-13 Thread maureen tanuwidjaja
Hi all, How to disable lucene norm factor? Thanks, Maureen - We won't tell. Get more on shows you hate to love (and love to hate): Yahoo! TV's Guilty Pleasures list.

Re: How to disable lucene norm factor?

2007-03-13 Thread maureen tanuwidjaja
ok mike.I'll try it and see wheter could work :) then I will proceed to optimize the index. Well then i guess it's fine to use the default value for maxMergeDocs which is INTEGER.MAX? Thanks a lot Regards, Maureen Michael McCandless [EMAIL PROTECTED] wrote: maureen

lengthNorm accessible?

2007-03-13 Thread maureen tanuwidjaja
. Thanks, Xiaocheng maureen tanuwidjaja wrote: Ya...I think i will store it in the database so that later it could be used in scoring/ranking for retrieval...:) Another thing i would like to see is whether the precision or recall will be much affaected by this... Regards, Maureen

Re: Optimizing Index

2007-02-22 Thread maureen tanuwidjaja
PROTECTED] wrote: maureen tanuwidjaja wrote: I had an exsisting index file with the size 20.6 GB...I havent done any optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I create program to optimize(which simply calls writer.optimize() to this indexfile),it gives

Optimizing Index

2007-02-21 Thread maureen tanuwidjaja
Hi, I had an exsisting index file with the size 20.6 GB...I havent done any optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I create program to optimize(which simply calls writer.optimize() to this indexfile),it gives the error that there is not enough space

Searching eats lots of memory?

2007-02-21 Thread maureen tanuwidjaja
I also would like to know wheter searching in the indexfile eats lots of memory...I always ran out of memory when doing searching,i.e. it gives the exception java heap space(although I have put -Xmx768 in the VM argument) ...Is there any way to solve it? - TV

Is there any way to optimize existing unoptimized index?

2007-02-07 Thread maureen tanuwidjaja
Hi, May I also ask wheter there is a way to use writer.optimize() without indexing the files from the beginning? It took me about 17 hrs to finish building an unoptimized index(finish when I call IndexWriter.close() ).I just wonder wheter this existing index could be optimized...

Building lucene index using 100 Gb Mobile HardDisk

2007-02-01 Thread maureen tanuwidjaja
Dear All, I was indexing 660,000 XML documents.The unoptimized index file was successfully built in about 17 hrs...This index file resides in my D drive which has the free space 38 Gb.This space is insufficient for optimizing the index file --I read Lucene documentation said about its

RE: Building lucene index using 100 Gb Mobile HardDisk

2007-02-01 Thread maureen tanuwidjaja
: maureen tanuwidjaja [mailto:[EMAIL PROTECTED] Sent: 01 February 2007 14:22 To: java-user@lucene.apache.org Subject: Building lucene index using 100 Gb Mobile HardDisk Dear All, I was indexing 660,000 XML documents.The unoptimized index file was successfully built in about 17 hrs...This index

Indexwriter can't add the 10000th document to the index

2007-01-28 Thread maureen tanuwidjaja
factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: My program stops indexing after 1th documents

Sorry, it is the 190,000th documents

2007-01-28 Thread maureen tanuwidjaja
triggering a thread dump to see what it was doing at that point? depending on your merge factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja : Reply-To: java-user@lucene.apache.org : To: java

printout of the stack trace while failing to indexing the 190,000th ocument

2007-01-28 Thread maureen tanuwidjaja
OK,This is the printout of the stack trace while failing to indexing the 190,000th ocument Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml

Re: printout of the stack trace while failing to indexing the 190,000th ocument

2007-01-28 Thread maureen tanuwidjaja
I think so ...btw may I ask the opinion, will it be useful to optimize let say every 50,000-60,000 documents? I have total of 660,000 docs... Erik Hatcher [EMAIL PROTECTED] wrote: On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote: OK,This is the printout of the stack trace while failing

My program stops indexing after 10000th documents is indexed

2007-01-27 Thread maureen tanuwidjaja
Hi all, Is there any limitation of number of file that lucene can handle? I indexed a total of 3 XML Documents,however it stops at 1th documents. No warning,no error ,no exception as well. Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml Indexing

Exception in thread main java.lang.OutOfMemoryError: Java heap space

2007-01-26 Thread maureen tanuwidjaja
Hi Mike and Erick and all, I have fixed my code and yes,indexing is much faster than previously when I do such hammering with IndexWriter However,I am now encountering the error while indexing Exception in thread main java.lang.OutOfMemoryError: Java heap space This error

Re: Exception in thread main java.lang.OutOfMemoryError: Java heap space

2007-01-26 Thread maureen tanuwidjaja
E...where shall I put that -XX:MaxPermSize=128m? Thanks Pustovalov Regards, Maureen Пустовалов Михаил [EMAIL PROTECTED] wrote: try this : -XX:MaxPermSize=128m On Fri, 26 Jan 2007 19:32:45 +0300, maureen tanuwidjaja wrote: Hi Mike and Erick

Re: Exception in thread main java.lang.OutOfMemoryError: Java heap space

2007-01-26 Thread maureen tanuwidjaja
oh thanks then:) Пустовалов Михаил [EMAIL PROTECTED] wrote: in your java command line, of course :) Example : java -Xms128m -Xmx1024m -server -Djava.awt.headless=true -XX:MaxPermSize=128m protei.Starter On Fri, 26 Jan 2007 19:39:13 +0300, maureen tanuwidjaja wrote

Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
Hi, I am indexing thousands of XML document,then it stops after indexing for about 7 hrs ... Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37003.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37004.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37008.xml Indexing

Re: Building Lucene index for XML document

2007-01-25 Thread maureen tanuwidjaja
and Best regards ^^ Maureen maureen tanuwidjaja [EMAIL PROTECTED] wrote: Thanks a lot Daniel :) Regards, Maureen Daniel Noll wrote: maureen tanuwidjaja wrote: Before implementing this search engine,I have designed to build the index in such a way that every XML tag is converted using

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
dunno wheter 7 hrs later it will raise the same problemLock obtain timed out 4.I use the latest version of Lucene (nightly build) Thanks and Regards, Maureen Michael McCandless [EMAIL PROTECTED] wrote: maureen tanuwidjaja wrote: I am indexing thousands of XML document

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
it in main is a recipe for disaster. Trust me on this one, I've spent way more time than I'd like to admit debugging this kind of problem . Best Erick On 1/25/07, maureen tanuwidjaja wrote: Hi Mike,thanks for the reply... 1.Here is the class that I use for indexing.. package

Re: Building Lucene index for XML document

2007-01-25 Thread maureen tanuwidjaja
Thanks Doron =) Regards, Maureen Doron Cohen [EMAIL PROTECTED] wrote: Hi Maureen, Some relevant info in the file formats doc - http://lucene.apache.org/java/docs/fileformats.html Regards, Doron maureen tanuwidjaja wrote on 25/01/2007 01:31:25: btw Daniel,can please give me

Building Lucene index for XML document

2007-01-24 Thread maureen tanuwidjaja
Hi... I am a Final Year Undergrad.My Final year project is about search engine for XML Document..I am currently building this system using Lucene. The example of XML element from an XML document : -- article body section