Re: java on 64 bits

2005-10-27 Thread Roxana Angheluta

Hello everyone!

Here are the conclusions we got after digging more into the problem, 
maybe they help someone:


1) Filling of the hard-drive was not due to java 64, this was 
coincidentally.
2) The intermediate files Yonik talked about (*.f*) were present because 
the indexing process was merging very large segments, which took a while 
to be merged.
3) We are indexing a continous stream of data. As documents get 
out-of-date they are deleted from the index. In order to ensure data 
throughput we use a batch indexing strategy by setting mergeFactor to 
50, but never optimizing. The downside of this is that it will take a 
long time before we reach the point where deleted documents are purged 
when out-of-date segments are merged. This means we end up with large 
segments that contain nothing but deleted documents that could be 
deleted if they weren't included in the segments file.
4) Assuming that frequently merging into a large segment doesn't affect 
the data throughput, then we should probably have implemented the 
strategy as described by Doug Cutting here - scroll down: 
http://www.gossamer-threads.com/lists/lucene/java-user/29350?page=last


Hth,
casper  roxana

Thanks everyone for the answers!
I'm experimenting with your suggestions, I will let you know if 
something interesting pops up.


roxana

1) make sure the failure was due to an OutOfMemory exception and not
something else.
2) if you have enough memory, increase the max JVM heap size (-Xmx)
3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
instead (depending on architecture, it can acutally be a little faster
because more references fit in the CPU cache).
4) see how many indexed fields you have and if you can consolidate 
any of

them
4.5) if you don't have too many indexed fields, and have enough spare 
file

descriptors, try using the non-compound file format instead.
5) run with the latest version of lucene (1.9 dev version) which may 
have

better memory usage during optimizes  segment merges.
6) If/when optional norms
http://issues.apache.org/jira/browse/LUCENE-448
makes it into lucene, you can apply it to any indexed fields for 
which you

don't need index-time boosting or length normalization.

As for getting rid of your current intermediate files, I'd rebuild from
scratch just to ensure things are OK.

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote:
 

Thank you, Yonik, it seems this is the case.
What can we do in this case? Would running the program with java 
-d32 be

a solution?

Thanks again,
roxana
  
One possibility: if lucene runs out of memory while adding or 
optimizing,
 

it
  

can leave unused files beind that increase the size of the index. A 64
 

bit
  

JVM will require more memory than a 32 bit one due to the size of all
references being doubled.

If you are using the compound file format (the default - check for 
.cfs

files), then it's easy to check if you have this problem by seeing if
 

there
  
are any *.f* files in the index directory. These are intermediate 
files
 

and
  

shouldn't exist for long in a compound-file index.

-Yonik
Now hiring -- http://tinyurl.com/7m67g
 
   


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: java on 64 bits

2005-10-21 Thread Roxana Angheluta

Thank you, Yonik, it seems this is the case.
What can we do in this case? Would running the program with java -d32 be 
a solution?


Thanks again,
roxana

One possibility: if lucene runs out of memory while adding or optimizing, it
can leave unused files beind that increase the size of the index. A 64 bit
JVM will require more memory than a 32 bit one due to the size of all
references being doubled.

If you are using the compound file format (the default - check for .cfs
files), then it's easy to check if you have this problem by seeing if there
are any *.f* files in the index directory. These are intermediate files and
shouldn't exist for long in a compound-file index.

-Yonik
Now hiring -- http://tinyurl.com/7m67g


On 10/20/05, Roxana Angheluta [EMAIL PROTECTED] wrote:
 

Hi everybody!

We have a large Lucene index which gets updated very often.
Until recently the java virtual machine used to manage the index was on
32 bits, although the program was running on a 64bits station. Last week
we changed the java to 64 bits and since then we experience strange
problems, the index grows very large. I'm not sure the 2 are related,
that's why I ask here: is it possible that the index got corrupted
after we updated the jvm? Is there any relation between the size of the
index and the jvm used?

I hope the questions make sense, thanks,
roxana

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


   


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: java on 64 bits

2005-10-21 Thread Volodymyr Bychkoviak

You can also try to clean up index with Luke.

Yonik Seeley wrote:

One possibility: if lucene runs out of memory while adding or optimizing, it
can leave unused files beind that increase the size of the index. A 64 bit
JVM will require more memory than a 32 bit one due to the size of all
references being doubled.

If you are using the compound file format (the default - check for .cfs
files), then it's easy to check if you have this problem by seeing if there
are any *.f* files in the index directory. These are intermediate files and
shouldn't exist for long in a compound-file index.

-Yonik
Now hiring -- http://tinyurl.com/7m67g


On 10/20/05, Roxana Angheluta [EMAIL PROTECTED] wrote:
  

Hi everybody!

We have a large Lucene index which gets updated very often.
Until recently the java virtual machine used to manage the index was on
32 bits, although the program was running on a 64bits station. Last week
we changed the java to 64 bits and since then we experience strange
problems, the index grows very large. I'm not sure the 2 are related,
that's why I ask here: is it possible that the index got corrupted
after we updated the jvm? Is there any relation between the size of the
index and the jvm used?

I hope the questions make sense, thanks,
roxana

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





  


--
regards,
Volodymyr Bychkoviak


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: java on 64 bits

2005-10-21 Thread Yonik Seeley
1) make sure the failure was due to an OutOfMemory exception and not
something else.
2) if you have enough memory, increase the max JVM heap size (-Xmx)
3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
instead (depending on architecture, it can acutally be a little faster
because more references fit in the CPU cache).
4) see how many indexed fields you have and if you can consolidate any of
them
4.5) if you don't have too many indexed fields, and have enough spare file
descriptors, try using the non-compound file format instead.
5) run with the latest version of lucene (1.9 dev version) which may have
better memory usage during optimizes  segment merges.
6) If/when optional norms
http://issues.apache.org/jira/browse/LUCENE-448
makes it into lucene, you can apply it to any indexed fields for which you
don't need index-time boosting or length normalization.

As for getting rid of your current intermediate files, I'd rebuild from
scratch just to ensure things are OK.

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote:

 Thank you, Yonik, it seems this is the case.
 What can we do in this case? Would running the program with java -d32 be
 a solution?

 Thanks again,
 roxana
 One possibility: if lucene runs out of memory while adding or optimizing,
 it
 can leave unused files beind that increase the size of the index. A 64
 bit
 JVM will require more memory than a 32 bit one due to the size of all
 references being doubled.
 
 If you are using the compound file format (the default - check for .cfs
 files), then it's easy to check if you have this problem by seeing if
 there
 are any *.f* files in the index directory. These are intermediate files
 and
 shouldn't exist for long in a compound-file index.
 
 -Yonik
 Now hiring -- http://tinyurl.com/7m67g




RE: java on 64 bits

2005-10-21 Thread Aigner, Thomas
I have seen quite a few posts on using the 1.9 dev version for
production uses.  How stable is it? Is it really ready for production?
I would like to use it.. but I never ever put beta packages in
procution.. but then again.. I'm always dealing with Microsoft :)

Tom

-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 21, 2005 9:28 AM
To: java-user@lucene.apache.org
Subject: Re: java on 64 bits

1) make sure the failure was due to an OutOfMemory exception and not
something else.
2) if you have enough memory, increase the max JVM heap size (-Xmx)
3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
instead (depending on architecture, it can acutally be a little faster
because more references fit in the CPU cache).
4) see how many indexed fields you have and if you can consolidate any
of
them
4.5) if you don't have too many indexed fields, and have enough spare
file
descriptors, try using the non-compound file format instead.
5) run with the latest version of lucene (1.9 dev version) which may
have
better memory usage during optimizes  segment merges.
6) If/when optional norms
http://issues.apache.org/jira/browse/LUCENE-448
makes it into lucene, you can apply it to any indexed fields for which
you
don't need index-time boosting or length normalization.

As for getting rid of your current intermediate files, I'd rebuild from
scratch just to ensure things are OK.

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote:

 Thank you, Yonik, it seems this is the case.
 What can we do in this case? Would running the program with java -d32
be
 a solution?

 Thanks again,
 roxana
 One possibility: if lucene runs out of memory while adding or
optimizing,
 it
 can leave unused files beind that increase the size of the index. A
64
 bit
 JVM will require more memory than a 32 bit one due to the size of all
 references being doubled.
 
 If you are using the compound file format (the default - check for
.cfs
 files), then it's easy to check if you have this problem by seeing if
 there
 are any *.f* files in the index directory. These are intermediate
files
 and
 shouldn't exist for long in a compound-file index.
 
 -Yonik
 Now hiring -- http://tinyurl.com/7m67g



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: java on 64 bits

2005-10-21 Thread jian chen
Hi,

Also, I think you may try to increase the indexInterval, it is set to 128,
but getting it larger, the .tii files will be smaller. Since .tii files are
loaded into memory as a whole, so, your memory usage might be smaller.
However, this change might affect your search speed. So, be careful about
the value you want to set, not too high though.

Just my thoughts, hope helps.

Jian

On 10/21/05, Aigner, Thomas [EMAIL PROTECTED] wrote:

 I have seen quite a few posts on using the 1.9 dev version for
 production uses. How stable is it? Is it really ready for production?
 I would like to use it.. but I never ever put beta packages in
 procution.. but then again.. I'm always dealing with Microsoft :)

 Tom

 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 21, 2005 9:28 AM
 To: java-user@lucene.apache.org
 Subject: Re: java on 64 bits

 1) make sure the failure was due to an OutOfMemory exception and not
 something else.
 2) if you have enough memory, increase the max JVM heap size (-Xmx)
 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
 instead (depending on architecture, it can acutally be a little faster
 because more references fit in the CPU cache).
 4) see how many indexed fields you have and if you can consolidate any
 of
 them
 4.5) if you don't have too many indexed fields, and have enough spare
 file
 descriptors, try using the non-compound file format instead.
 5) run with the latest version of lucene (1.9 dev version) which may
 have
 better memory usage during optimizes  segment merges.
 6) If/when optional norms
 http://issues.apache.org/jira/browse/LUCENE-448
 makes it into lucene, you can apply it to any indexed fields for which
 you
 don't need index-time boosting or length normalization.

 As for getting rid of your current intermediate files, I'd rebuild from
 scratch just to ensure things are OK.

 -Yonik
 Now hiring -- http://tinyurl.com/7m67g

 On 10/21/05, Roxana Angheluta [EMAIL PROTECTED] wrote:
 
  Thank you, Yonik, it seems this is the case.
  What can we do in this case? Would running the program with java -d32
 be
  a solution?
 
  Thanks again,
  roxana
  One possibility: if lucene runs out of memory while adding or
 optimizing,
  it
  can leave unused files beind that increase the size of the index. A
 64
  bit
  JVM will require more memory than a 32 bit one due to the size of all
  references being doubled.
  
  If you are using the compound file format (the default - check for
 .cfs
  files), then it's easy to check if you have this problem by seeing if
  there
  are any *.f* files in the index directory. These are intermediate
 files
  and
  shouldn't exist for long in a compound-file index.
  
  -Yonik
  Now hiring -- http://tinyurl.com/7m67g
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




java on 64 bits

2005-10-20 Thread Roxana Angheluta

Hi everybody!

We have a large Lucene index which gets updated very often.
Until recently the java virtual machine used to manage the index was on 
32 bits, although the program was running on a 64bits station. Last week 
we changed the java to 64 bits and since then we experience strange 
problems, the index grows very large. I'm not sure the 2 are related, 
that's why I ask here: is it possible that the index got corrupted  
after we updated the jvm? Is there any relation between the size of the 
index and the jvm used?


I hope the questions make sense, thanks,
roxana

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]