RE: JVM Crash in Lucene

2006-02-08 Thread Daniel Pfeifer
I resolved this issue for the time-being by adding following parameter
to the command:

-XX:CompileCommand=exclude,org/apache/lucene/index/IndexReader$1,doBody

/Daniel

-Original Message-
From: Daniel Pfeifer [mailto:[EMAIL PROTECTED] 
Sent: den 8 februari 2006 08:05
To: java-user@lucene.apache.org
Subject: Re: JVM Crash in Lucene

Got the same problem. Running 1.5.0_05 on Solaris 10. I've seen that
this issue has been reported on Sun's forum but no answer yet.

Another interesting thing which I noticed. We previously used the
RAMDirectory and we never got JVM-crashes when using RAMDirectory.
However, once we started using FSDirectory the JVM started to crash.

I tested adding -Xcomp parameter and the JVM has not crashed yet. But
then again, the SearchService hasn't been up long enough to be sure that
it solved the problem.

/Daniel

 You also might try -Xbatch or -Xcomp to see if that fixes it (or
 reproduces it faster).

 Here's a great list of JVM options:
 http://blogs.sun.com/roller/resources/watt/jvm-options-list.html

 -Yonik

 On 12/11/05, Yonik Seeley [EMAIL PROTECTED] wrote:
  Sounds like it's a hotspot bug.
  AFAIK, hotspot doesn't just compile a method once... it can do
  optimization over time.
 
  To work around it, have you tried pre previous version: 1.5_05?
  It's possible it's a fairly new bug.  We've been running with that
  version and Lucene 1.4.3 without problems (on Opteron, RHEL4).
 
  You could also try the latest Lucene 1.9 to see if that changes
enough
  to avoid the bug.
 
  -Yonik
 
  On 12/11/05, Dan Gould [EMAIL PROTECTED] wrote:
   First, thank you Chris, Yonik, and Dan for your ideas as to what
might be
   causing this problem.
  
   I tried moving things around so that the IndexReader is still open
when it
   calls
TermFreqVector.getTerms()/TermFreqVector.getTermFrequencies().  It
   didn't seem to make any difference.
  
   I also tried running Java with the flags:
   -Xmx2048m -XX:MaxPermSize=200m
   (the box has 4GB of RAM) and it still crashes.  It's hard to tell,
but the
   program does seem to run for a lot longer (maybe 10 hours), but
that could
   just be randomness in my tests.
  
   The JVM always seems to crash with
  
 Current CompileTask:
 opto:1836
 org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Object;
 (99 bytes)
  
   which in the Lucene source is:
  
  private static IndexReader open(final Directory directory,
final boolean
   closeDirectory) throws IOException {
synchronized (directory) {   // in- 
inter-process
   sync
  return (IndexReader)new Lock.With(
  directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
  IndexWriter.COMMIT_LOCK_TIMEOUT) {
  public Object doBody() throws IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory);
if (infos.size() == 1) { // index is
optimized
  return SegmentReader.get(infos, infos.info(0),
   closeDirectory);
}
IndexReader[] readers = new
IndexReader[infos.size()];
for (int i = 0; i  infos.size(); i++)
  readers[i] = SegmentReader.get(infos.info(i));
return new MultiReader(directory, infos,
closeDirectory,
   readers);
  
  }
}.run();
}
  }
  
   that's definitely a non-trivial bit of code, but I can't imagine
that
   there's a problem that I'm seeing that no one else else.
Moreover, that
   code gets run hundreds or even thousands of times before it
crashes, so I
   don't image it's being HotSpot-compiled for the first time.
  
   I'm running the 1.4.3 release and the 1.5.0_06-b05 JVM on Centos
Linux on
   an Opteron.
  
   Any further guesses?
  
   Thank you all very much,
   Dan
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JVM Crash in Lucene

2006-02-07 Thread Daniel Pfeifer
Got the same problem. Running 1.5.0_05 on Solaris 10. I've seen that
this issue has been reported on Sun's forum but no answer yet.

Another interesting thing which I noticed. We previously used the
RAMDirectory and we never got JVM-crashes when using RAMDirectory.
However, once we started using FSDirectory the JVM started to crash.

I tested adding -Xcomp parameter and the JVM has not crashed yet. But
then again, the SearchService hasn't been up long enough to be sure that
it solved the problem.

/Daniel

 You also might try -Xbatch or -Xcomp to see if that fixes it (or
 reproduces it faster).

 Here's a great list of JVM options:
 http://blogs.sun.com/roller/resources/watt/jvm-options-list.html

 -Yonik

 On 12/11/05, Yonik Seeley [EMAIL PROTECTED] wrote:
  Sounds like it's a hotspot bug.
  AFAIK, hotspot doesn't just compile a method once... it can do
  optimization over time.
 
  To work around it, have you tried pre previous version: 1.5_05?
  It's possible it's a fairly new bug.  We've been running with that
  version and Lucene 1.4.3 without problems (on Opteron, RHEL4).
 
  You could also try the latest Lucene 1.9 to see if that changes
enough
  to avoid the bug.
 
  -Yonik
 
  On 12/11/05, Dan Gould [EMAIL PROTECTED] wrote:
   First, thank you Chris, Yonik, and Dan for your ideas as to what
might be
   causing this problem.
  
   I tried moving things around so that the IndexReader is still open
when it
   calls
TermFreqVector.getTerms()/TermFreqVector.getTermFrequencies().  It
   didn't seem to make any difference.
  
   I also tried running Java with the flags:
   -Xmx2048m -XX:MaxPermSize=200m
   (the box has 4GB of RAM) and it still crashes.  It's hard to tell,
but the
   program does seem to run for a lot longer (maybe 10 hours), but
that could
   just be randomness in my tests.
  
   The JVM always seems to crash with
  
 Current CompileTask:
 opto:1836
 org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Object;
 (99 bytes)
  
   which in the Lucene source is:
  
  private static IndexReader open(final Directory directory,
final boolean
   closeDirectory) throws IOException {
synchronized (directory) {   // in- 
inter-process
   sync
  return (IndexReader)new Lock.With(
  directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
  IndexWriter.COMMIT_LOCK_TIMEOUT) {
  public Object doBody() throws IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory);
if (infos.size() == 1) { // index is
optimized
  return SegmentReader.get(infos, infos.info(0),
   closeDirectory);
}
IndexReader[] readers = new
IndexReader[infos.size()];
for (int i = 0; i  infos.size(); i++)
  readers[i] = SegmentReader.get(infos.info(i));
return new MultiReader(directory, infos,
closeDirectory,
   readers);
  
  }
}.run();
}
  }
  
   that's definitely a non-trivial bit of code, but I can't imagine
that
   there's a problem that I'm seeing that no one else else.
Moreover, that
   code gets run hundreds or even thousands of times before it
crashes, so I
   don't image it's being HotSpot-compiled for the first time.
  
   I'm running the 1.4.3 release and the 1.5.0_06-b05 JVM on Centos
Linux on
   an Opteron.
  
   Any further guesses?
  
   Thank you all very much,
   Dan
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sending query to multiple servers and combine all Hits from them ?

2006-02-01 Thread Daniel Pfeifer
You search all four servers by doing this (the QueryParser in this
example uses the Lucene 1.9 syntax):

 

Searchable[] searchables = new Searchable[]{(Searchable)
Naming.lookup(x1), (Searchable) Naming.lookup(x2), ...};

MultiSearcher multiSearcher = new MultiSearcher(searchables);

Hits hits = multiSearcher.search(new QueryParser(title, new
StandardAnalyzer()).parse(title:Ajax));

 

/Daniel

 



From: Vikas Khengare [mailto:[EMAIL PROTECTED] 
Sent: den 2 februari 2006 06:30
To: lucene-user@jakarta.apache.org; lucene-dev@jakarta.apache.org;
java-dev@lucene.apache.org; java-user@lucene.apache.org;
java-commits@lucene.apache.org
Subject: Sending query to multiple servers and combine all Hits from
them ?

 

Hi Friends...

 

I am doing search application which has following scenario.

 

Architecture ==

1] Common GUI

2] When user enter one query then It should go to 4 searcher server (All
servers are on remote machines)

3] After searching all 4 server should return results i.e. Hits ( All
server return hits objects in diff formats i.e. all 4 servers return hit
doc's in diff formats)

4] Combine all Hits and form only type of result

5] Show that result in Uniform way to user

 

Problems ==

1] How do I send my search query to all 4 searcher server. ?

2] After searching How do I get all result (Hits) for combining them in
one bundle; Because they all are in different format. ?

3] Shall I use AJAX for sending query across multiple server and getting
results back from all servers( Or any other technology, if yes specify)
?

4] How do I combine all hits ?

 

 

 



Thanks..

 

Best Regards

[ [EMAIL PROTECTED] ]



RE: Performance tips?

2006-01-27 Thread Daniel Pfeifer
Well,

We are sporting Solaris 10 on a Sun Fire-machine with four cores and
12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching
to FSDirectory and hope for the best.

-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED] 
Sent: den 27 januari 2006 08:50
To: java-user@lucene.apache.org
Subject: Re: Performance tips?

I seem to say this a lot :), but, assuming your OS has a decent
filesystem cache, try reducing your JVM heapsize, using an FSDirectory
instead of RAMDirectory, and see if your filesystem cache does ok.  If
you have 12GB, then you should have enough RAM to hold both the old
and new indexes during the switchover.

-chris

On 1/26/06, Daniel Pfeifer [EMAIL PROTECTED] wrote:
 Hi,



 Got more questions regarding Lucene and this time it's about
performance
 ;-)



 We currently are using RAMDirectories to read our Indexes. This has
now
 become a problem since our index has grown to appx 5GB of RAM and the
 machine we are running on only has 12GB of RAM and everytime we
refresh
 the RAMDirectories we of course keep the old Searchables so that there
 is no service interruption.



 This means we consume 10GB of RAM from time to time. One solution is
of
 course to stop using RAM and read anything from disk but I can imagine
 that the performance will decrease significantly. Is there any
 workaround you can think of? Perhaps a hybrid between FSDirectory and
 RAMDirectory. For example that only frequently searched documents are
 cached and the others are read from disk?



 Well, I'd appreciate any ideas at all!
 Thanks
 /Daniel




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [SPAM] - Re: Performance tips? - Sending mail server found on bl.spamcop.net

2006-01-27 Thread Daniel Pfeifer
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find
a class called MapDirectory or MMapDirectory.

/Daniel

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: den 27 januari 2006 11:43
To: java-user@lucene.apache.org
Subject: [SPAM] - Re: Performance tips? - Sending mail server found on
bl.spamcop.net

Daniel Pfeifer wrote:
 We are sporting Solaris 10 on a Sun Fire-machine with four cores and
 12GB of RAM and mirrored Ultra 320-disks. I guess I could try
switching
 to FSDirectory and hope for the best.

Or, since you're on a 64-bit platform, try MMapDirectory, which supports

greater parallelism than FSDirectory.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Two strange things in Lucene

2006-01-26 Thread Daniel Pfeifer
 Since I didn't find anything in the log from log4j I did a kill  
 -3 on
  the process and found two very interesting things:
 
 Almost all multisearcher threads were in this state:
 
 MultiSearcher thread #1 daemon prio=10 tid=0x01900960
 nid=0x81442c waiting for monitor entry
 [0xfd7d269ff000..0xfd7d269ffb50]
  at java.util.Vector.size(Vector.java:270)
  - waiting to lock 0xfd7f0114ea28 (a java.util.Vector)
  at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.init 
 (BooleanQuery.
 java:95)
 
 I don't know about this one, but guessing that it just happens to be  
 a normal state of the system when you killed the process.  *shrugs*
 
You probably missed the -3 parameter. This just dumps the state of the
virtual machine, it doesn't actually kill the JVM. Thus I believe that
this is not a normal state.
 
 And, additionally I found another stacktrace in the stdout-log which
I
 find interesting:
 
 Exception in thread MultiSearcher thread #1
 org.apache.lucene.search.BooleanQuery$TooManyClauses
 
 This is a typical occurrence when using Query's that expand such as  
 WildcardQuery, RangeQuery, FuzzyQuery, etc.  If users are doing  
 queries like a* and there are over 1024 terms that start with a  
 then you will, by default, blow up WildcardQuery's expansion into a  
 BooleanQuery.  You can up that limit on BooleanQuery, or disallow  
 those types of queries perhaps.
 
Ok, I'll see what I can do.
 
Thanks!


Limiting hits?

2006-01-19 Thread Daniel Pfeifer
Hi,

I am currently looking for a way to limit the amount of Hits which are
returned by a Query.

What I am doing is following:

Searcher s = ...;
Query q = QueryParser.parse(..., ..., new StandardAnalyzer());
searcher.search(query);

We have approximately 10 million products in our Index and of these 10
million products there might be 100.000 which have the word processor
in it's description.

Say a user on our website is searching for processor the Index (to which
I connect by RMI) is finding 100.000 products and returns these Hits.

Is it possible to implement away to return no more than 1000 products?
Is it possible to add something like name:processor AND
maxresults:1000?

Thanks in advance!
/Daniel

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



A couple of questions regarding load balancing and failover

2005-11-30 Thread Daniel Pfeifer
Hi,

I am working for a major Application Service Provider in Europe and we
have now since a couple of months very successfully used Lucene 1.4. We
are overall very pleased with it but as the load on the application
which uses Lucene increased we were forced to invest in better hardware
and also in redundancy.

Since I am not 100% sure if everything is implemented as it should I
would like to ask you all to answer a couple of questions. First
however, I want to explain how our architecture currently looks like:

We are running a very Service-Oriented Architecture and thus lots of our
applications use JINI and RMI services. We currently got four main
applications which use Lucene and these applications connect to our
Lucene-index by RMI. We've got two Lucene Servers and both access the
same index-files which are placed on a shared drive. These two servers
simply expose the indexes by a RemoteSearchable and all applications
which use Lucene simply connect to these RemoteSearchables via RMI.
Also, we have another server which does nothing but update the
indexfiles.

Now my questions:

1.) Does Lucenes MultiSearcher implement some kind of automatic failover
and/or load-balancing mechanism if both Searchables which I supply in
MultiSearchers constructor go to two different servers but to the very
same index-files? I.e. if server 1 crashes there is still server 2 and
thus at least one server will be able to complete the request. Will the
MultiSearcher which is using RemoteSearchables from two servers
automatically detect that Searchable number 1 (server 1) does not
respond and then try Searchable 2? If not, what is the recommended way
of doing this. And the second part of the question is: If both
Searchables are available and working, will the MultiSearcher
automatically distribute requests to both Searchables or is there a risk
that we get duplicates since both Searchables actually expose the same
indexes? If this isn't the case, what would be the recommended way of
implemented load distribution over several servers.

2.) On our index-servers which expose the underlaying index as a
RemoteSearchable we do have four dualcore processors each. Since we thus
have great multithreading-capabilities I do use the
ParallelMultiSearcher instead of the MultiSearcher. On the client side
(the application which connects to the Index RMI-Server), should I
therefore also be using a ParallelMultiSearcher or is it ok if I use the
standard MultiSeacher? And if so, why?

3.) Currently - to increase speed - we are loading the entire index into
memory (using RAMDirectory rather than the FSDirectory). We found out
that the RAMDirectory will not update itself if the files in the
directory from where the RAMDirectory is loading the index are updated.
Therefore I simply coded a Thread which every 10 minutes instantiates
new RAMDirectories, unbinds the current RemoteSearchable and then
rebinds to the RMI Registry with the new Searchable which uses the new
RAMDirectories. This certainly doesn't feel like a good solution, even
though the time under which the RMI service will not be able to answer
is minimal, there is still a small chance that this very moment a client
application tries to find something in the index. Is there a way to
refresh the RAMDirectory without having to create new instances of all
classes and bind this classes to the RMI Registry? If so, how?

I would be enormously thankful if you guys could answer my questions as
our load is increasing daily and we would like to have our Lucene-index
working as smoothly as possible!

Daniel Pfeifer

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]