RE: JVM Crash in Lucene
I resolved this issue for the time-being by adding following parameter to the command: -XX:CompileCommand=exclude,org/apache/lucene/index/IndexReader$1,doBody /Daniel -Original Message- From: Daniel Pfeifer [mailto:[EMAIL PROTECTED] Sent: den 8 februari 2006 08:05 To: java-user@lucene.apache.org Subject: Re: JVM Crash in Lucene Got the same problem. Running 1.5.0_05 on Solaris 10. I've seen that this issue has been reported on Sun's forum but no answer yet. Another interesting thing which I noticed. We previously used the RAMDirectory and we never got JVM-crashes when using RAMDirectory. However, once we started using FSDirectory the JVM started to crash. I tested adding -Xcomp parameter and the JVM has not crashed yet. But then again, the SearchService hasn't been up long enough to be sure that it solved the problem. /Daniel You also might try -Xbatch or -Xcomp to see if that fixes it (or reproduces it faster). Here's a great list of JVM options: http://blogs.sun.com/roller/resources/watt/jvm-options-list.html -Yonik On 12/11/05, Yonik Seeley [EMAIL PROTECTED] wrote: Sounds like it's a hotspot bug. AFAIK, hotspot doesn't just compile a method once... it can do optimization over time. To work around it, have you tried pre previous version: 1.5_05? It's possible it's a fairly new bug. We've been running with that version and Lucene 1.4.3 without problems (on Opteron, RHEL4). You could also try the latest Lucene 1.9 to see if that changes enough to avoid the bug. -Yonik On 12/11/05, Dan Gould [EMAIL PROTECTED] wrote: First, thank you Chris, Yonik, and Dan for your ideas as to what might be causing this problem. I tried moving things around so that the IndexReader is still open when it calls TermFreqVector.getTerms()/TermFreqVector.getTermFrequencies(). It didn't seem to make any difference. I also tried running Java with the flags: -Xmx2048m -XX:MaxPermSize=200m (the box has 4GB of RAM) and it still crashes. It's hard to tell, but the program does seem to run for a lot longer (maybe 10 hours), but that could just be randomness in my tests. The JVM always seems to crash with Current CompileTask: opto:1836 org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Object; (99 bytes) which in the Lucene source is: private static IndexReader open(final Directory directory, final boolean closeDirectory) throws IOException { synchronized (directory) { // in- inter-process sync return (IndexReader)new Lock.With( directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { SegmentInfos infos = new SegmentInfos(); infos.read(directory); if (infos.size() == 1) { // index is optimized return SegmentReader.get(infos, infos.info(0), closeDirectory); } IndexReader[] readers = new IndexReader[infos.size()]; for (int i = 0; i infos.size(); i++) readers[i] = SegmentReader.get(infos.info(i)); return new MultiReader(directory, infos, closeDirectory, readers); } }.run(); } } that's definitely a non-trivial bit of code, but I can't imagine that there's a problem that I'm seeing that no one else else. Moreover, that code gets run hundreds or even thousands of times before it crashes, so I don't image it's being HotSpot-compiled for the first time. I'm running the 1.4.3 release and the 1.5.0_06-b05 JVM on Centos Linux on an Opteron. Any further guesses? Thank you all very much, Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JVM Crash in Lucene
Got the same problem. Running 1.5.0_05 on Solaris 10. I've seen that this issue has been reported on Sun's forum but no answer yet. Another interesting thing which I noticed. We previously used the RAMDirectory and we never got JVM-crashes when using RAMDirectory. However, once we started using FSDirectory the JVM started to crash. I tested adding -Xcomp parameter and the JVM has not crashed yet. But then again, the SearchService hasn't been up long enough to be sure that it solved the problem. /Daniel You also might try -Xbatch or -Xcomp to see if that fixes it (or reproduces it faster). Here's a great list of JVM options: http://blogs.sun.com/roller/resources/watt/jvm-options-list.html -Yonik On 12/11/05, Yonik Seeley [EMAIL PROTECTED] wrote: Sounds like it's a hotspot bug. AFAIK, hotspot doesn't just compile a method once... it can do optimization over time. To work around it, have you tried pre previous version: 1.5_05? It's possible it's a fairly new bug. We've been running with that version and Lucene 1.4.3 without problems (on Opteron, RHEL4). You could also try the latest Lucene 1.9 to see if that changes enough to avoid the bug. -Yonik On 12/11/05, Dan Gould [EMAIL PROTECTED] wrote: First, thank you Chris, Yonik, and Dan for your ideas as to what might be causing this problem. I tried moving things around so that the IndexReader is still open when it calls TermFreqVector.getTerms()/TermFreqVector.getTermFrequencies(). It didn't seem to make any difference. I also tried running Java with the flags: -Xmx2048m -XX:MaxPermSize=200m (the box has 4GB of RAM) and it still crashes. It's hard to tell, but the program does seem to run for a lot longer (maybe 10 hours), but that could just be randomness in my tests. The JVM always seems to crash with Current CompileTask: opto:1836 org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Object; (99 bytes) which in the Lucene source is: private static IndexReader open(final Directory directory, final boolean closeDirectory) throws IOException { synchronized (directory) { // in- inter-process sync return (IndexReader)new Lock.With( directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { SegmentInfos infos = new SegmentInfos(); infos.read(directory); if (infos.size() == 1) { // index is optimized return SegmentReader.get(infos, infos.info(0), closeDirectory); } IndexReader[] readers = new IndexReader[infos.size()]; for (int i = 0; i infos.size(); i++) readers[i] = SegmentReader.get(infos.info(i)); return new MultiReader(directory, infos, closeDirectory, readers); } }.run(); } } that's definitely a non-trivial bit of code, but I can't imagine that there's a problem that I'm seeing that no one else else. Moreover, that code gets run hundreds or even thousands of times before it crashes, so I don't image it's being HotSpot-compiled for the first time. I'm running the 1.4.3 release and the 1.5.0_06-b05 JVM on Centos Linux on an Opteron. Any further guesses? Thank you all very much, Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sending query to multiple servers and combine all Hits from them ?
You search all four servers by doing this (the QueryParser in this example uses the Lucene 1.9 syntax): Searchable[] searchables = new Searchable[]{(Searchable) Naming.lookup(x1), (Searchable) Naming.lookup(x2), ...}; MultiSearcher multiSearcher = new MultiSearcher(searchables); Hits hits = multiSearcher.search(new QueryParser(title, new StandardAnalyzer()).parse(title:Ajax)); /Daniel From: Vikas Khengare [mailto:[EMAIL PROTECTED] Sent: den 2 februari 2006 06:30 To: lucene-user@jakarta.apache.org; lucene-dev@jakarta.apache.org; java-dev@lucene.apache.org; java-user@lucene.apache.org; java-commits@lucene.apache.org Subject: Sending query to multiple servers and combine all Hits from them ? Hi Friends... I am doing search application which has following scenario. Architecture == 1] Common GUI 2] When user enter one query then It should go to 4 searcher server (All servers are on remote machines) 3] After searching all 4 server should return results i.e. Hits ( All server return hits objects in diff formats i.e. all 4 servers return hit doc's in diff formats) 4] Combine all Hits and form only type of result 5] Show that result in Uniform way to user Problems == 1] How do I send my search query to all 4 searcher server. ? 2] After searching How do I get all result (Hits) for combining them in one bundle; Because they all are in different format. ? 3] Shall I use AJAX for sending query across multiple server and getting results back from all servers( Or any other technology, if yes specify) ? 4] How do I combine all hits ? Thanks.. Best Regards [ [EMAIL PROTECTED] ]
RE: Performance tips?
Well, We are sporting Solaris 10 on a Sun Fire-machine with four cores and 12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching to FSDirectory and hope for the best. -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: den 27 januari 2006 08:50 To: java-user@lucene.apache.org Subject: Re: Performance tips? I seem to say this a lot :), but, assuming your OS has a decent filesystem cache, try reducing your JVM heapsize, using an FSDirectory instead of RAMDirectory, and see if your filesystem cache does ok. If you have 12GB, then you should have enough RAM to hold both the old and new indexes during the switchover. -chris On 1/26/06, Daniel Pfeifer [EMAIL PROTECTED] wrote: Hi, Got more questions regarding Lucene and this time it's about performance ;-) We currently are using RAMDirectories to read our Indexes. This has now become a problem since our index has grown to appx 5GB of RAM and the machine we are running on only has 12GB of RAM and everytime we refresh the RAMDirectories we of course keep the old Searchables so that there is no service interruption. This means we consume 10GB of RAM from time to time. One solution is of course to stop using RAM and read anything from disk but I can imagine that the performance will decrease significantly. Is there any workaround you can think of? Perhaps a hybrid between FSDirectory and RAMDirectory. For example that only frequently searched documents are cached and the others are read from disk? Well, I'd appreciate any ideas at all! Thanks /Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [SPAM] - Re: Performance tips? - Sending mail server found on bl.spamcop.net
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find a class called MapDirectory or MMapDirectory. /Daniel -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: den 27 januari 2006 11:43 To: java-user@lucene.apache.org Subject: [SPAM] - Re: Performance tips? - Sending mail server found on bl.spamcop.net Daniel Pfeifer wrote: We are sporting Solaris 10 on a Sun Fire-machine with four cores and 12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching to FSDirectory and hope for the best. Or, since you're on a 64-bit platform, try MMapDirectory, which supports greater parallelism than FSDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Two strange things in Lucene
Since I didn't find anything in the log from log4j I did a kill -3 on the process and found two very interesting things: Almost all multisearcher threads were in this state: MultiSearcher thread #1 daemon prio=10 tid=0x01900960 nid=0x81442c waiting for monitor entry [0xfd7d269ff000..0xfd7d269ffb50] at java.util.Vector.size(Vector.java:270) - waiting to lock 0xfd7f0114ea28 (a java.util.Vector) at org.apache.lucene.search.BooleanQuery$BooleanWeight.init (BooleanQuery. java:95) I don't know about this one, but guessing that it just happens to be a normal state of the system when you killed the process. *shrugs* You probably missed the -3 parameter. This just dumps the state of the virtual machine, it doesn't actually kill the JVM. Thus I believe that this is not a normal state. And, additionally I found another stacktrace in the stdout-log which I find interesting: Exception in thread MultiSearcher thread #1 org.apache.lucene.search.BooleanQuery$TooManyClauses This is a typical occurrence when using Query's that expand such as WildcardQuery, RangeQuery, FuzzyQuery, etc. If users are doing queries like a* and there are over 1024 terms that start with a then you will, by default, blow up WildcardQuery's expansion into a BooleanQuery. You can up that limit on BooleanQuery, or disallow those types of queries perhaps. Ok, I'll see what I can do. Thanks!
Limiting hits?
Hi, I am currently looking for a way to limit the amount of Hits which are returned by a Query. What I am doing is following: Searcher s = ...; Query q = QueryParser.parse(..., ..., new StandardAnalyzer()); searcher.search(query); We have approximately 10 million products in our Index and of these 10 million products there might be 100.000 which have the word processor in it's description. Say a user on our website is searching for processor the Index (to which I connect by RMI) is finding 100.000 products and returns these Hits. Is it possible to implement away to return no more than 1000 products? Is it possible to add something like name:processor AND maxresults:1000? Thanks in advance! /Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
A couple of questions regarding load balancing and failover
Hi, I am working for a major Application Service Provider in Europe and we have now since a couple of months very successfully used Lucene 1.4. We are overall very pleased with it but as the load on the application which uses Lucene increased we were forced to invest in better hardware and also in redundancy. Since I am not 100% sure if everything is implemented as it should I would like to ask you all to answer a couple of questions. First however, I want to explain how our architecture currently looks like: We are running a very Service-Oriented Architecture and thus lots of our applications use JINI and RMI services. We currently got four main applications which use Lucene and these applications connect to our Lucene-index by RMI. We've got two Lucene Servers and both access the same index-files which are placed on a shared drive. These two servers simply expose the indexes by a RemoteSearchable and all applications which use Lucene simply connect to these RemoteSearchables via RMI. Also, we have another server which does nothing but update the indexfiles. Now my questions: 1.) Does Lucenes MultiSearcher implement some kind of automatic failover and/or load-balancing mechanism if both Searchables which I supply in MultiSearchers constructor go to two different servers but to the very same index-files? I.e. if server 1 crashes there is still server 2 and thus at least one server will be able to complete the request. Will the MultiSearcher which is using RemoteSearchables from two servers automatically detect that Searchable number 1 (server 1) does not respond and then try Searchable 2? If not, what is the recommended way of doing this. And the second part of the question is: If both Searchables are available and working, will the MultiSearcher automatically distribute requests to both Searchables or is there a risk that we get duplicates since both Searchables actually expose the same indexes? If this isn't the case, what would be the recommended way of implemented load distribution over several servers. 2.) On our index-servers which expose the underlaying index as a RemoteSearchable we do have four dualcore processors each. Since we thus have great multithreading-capabilities I do use the ParallelMultiSearcher instead of the MultiSearcher. On the client side (the application which connects to the Index RMI-Server), should I therefore also be using a ParallelMultiSearcher or is it ok if I use the standard MultiSeacher? And if so, why? 3.) Currently - to increase speed - we are loading the entire index into memory (using RAMDirectory rather than the FSDirectory). We found out that the RAMDirectory will not update itself if the files in the directory from where the RAMDirectory is loading the index are updated. Therefore I simply coded a Thread which every 10 minutes instantiates new RAMDirectories, unbinds the current RemoteSearchable and then rebinds to the RMI Registry with the new Searchable which uses the new RAMDirectories. This certainly doesn't feel like a good solution, even though the time under which the RMI service will not be able to answer is minimal, there is still a small chance that this very moment a client application tries to find something in the index. Is there a way to refresh the RAMDirectory without having to create new instances of all classes and bind this classes to the RMI Registry? If so, how? I would be enormously thankful if you guys could answer my questions as our load is increasing daily and we would like to have our Lucene-index working as smoothly as possible! Daniel Pfeifer - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]