Re: Help optimizing
On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: Hi (again) people We've now invested in a server with 8 GB of RAM after too many OutOfMemory-errors. Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. Also, have you done any profiling to see where the hotspots are? -Grant
RE: Help optimizing
One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Lance Norskog -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 5:19 AM To: solr-user@lucene.apache.org Subject: Re: Help optimizing On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: Hi (again) people We've now invested in a server with 8 GB of RAM after too many OutOfMemory-errors. Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. Also, have you done any profiling to see where the hotspots are? -Grant
Re: Help optimizing
Hello, If you are using Jetty, you don't have to dig very deep - just look for the section about threads. Here is a snippet from Jetty 6.1.9's jetty.xml: Set name=ThreadPool !-- Default bounded blocking threadpool -- New class=org.mortbay.thread.BoundedThreadPool Set name=minThreads10/Set Set name=maxThreads50/Set Set name=lowThreads25/Set /New !-- New queued blocking threadpool : better scalability New class=org.mortbay.thread.QueuedThreadPool Set name=minThreads10/Set Set name=maxThreads25/Set Set name=lowThreads5/Set Set name=SpawnOrShrinkAt2/Set /New -- !-- Optional Java 5 bounded threadpool with job queue New class=org.mortbay.thread.concurrent.ThreadPool Set name=corePoolSize50/Set Set name=maximumPoolSize50/Set /New -- /Set Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lance Norskog [EMAIL PROTECTED] To: solr-user@lucene.apache.org Cc: Norskog, Lance [EMAIL PROTECTED] Sent: Tuesday, May 6, 2008 1:26:28 PM Subject: RE: Help optimizing One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Lance Norskog -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 5:19 AM To: solr-user@lucene.apache.org Subject: Re: Help optimizing On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: Hi (again) people We've now invested in a server with 8 GB of RAM after too many OutOfMemory-errors. Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. Also, have you done any profiling to see where the hotspots are? -Grant
Re: Help optimizing
Thanks Otis! On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: You have a lot of fields of type text, but a number of field sound like they really need not be tokenized and should thus be of type string. I've changed quite a few of them over to string. Still not sure about the difference between 'string' and 'text' :-/ Do you really need 6 warming searchers? That I have no idea about. Currently it's a very small site, well, visitor-wise anyway. I think date type is pretty granular. Do you really need that type of precision? Probably not, have changed it to sint and will index the date in this format 20070310, which should do the trick. I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, or should it be lowercase 'm'? Append the letter k or K to indicate kilobytes or the letter m or M to indicate megabytes., so yeah, should recognize it. Have you noticed anything weird while looking at the Solr Java process with jConsole? I'm not very familiar with Java, so no idea what jConsole is :-/ Will be re-indexing tomorrow with the date-sint and text-string changes, will report back after it's done. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 4:00 AM, Mike Klaas wrote: On 3-May-08, at 10:06 AM, Daniel Andersson wrote: How do I optimize Solr to better use all the RAM? I'm using java6, 64bit version, and start Solr using: java -Xmx7500M -Xms4096M -jar start.jar But according to top it only seems to be using 7.7% of the memory (around 600 MB). Don't try to give Solr _all_ the memory on the system. Solr depends on the index existing in the OS's disk cache (this is cached in top). You should have at least 2 GB memory for a 3.5GB index, depending on how much of the index is stored (best is of course to have 3.5GB available so it can be cached completely). Solr will require a wide distribution of queries to warm up (get the index in the OS disk cache). This is automatically prioritize the hot spots in the index. If you want to load the whole thing 'cd datadir; cat * /dev/null' works, but I don't recommend relying on that. Ah. Have given it 4 GB of RAM now (Xmx=4 GB, Xms=2 GB) Most queries are for make_id + model_id or city + state and almost all of the queries are ordered by datetime_found (newest - oldest). How many documents match, typically? How many documents are returned, typically? How often do you commit() [I suspect frequently, based on the problems you are having]? Average documents matched/found: 6427 Only return 10 documents per page Commit every 10,000 documents. Tried it at 100,000 with 2 GB of ram (1 GB dedicated to Solr) and it just gave me OutOfMemory every time. Haven't tried increasing it since moving it to this new server. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote: On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: When performing a search, the results vary between 1.5 seconds up to 60 seconds. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. It's 1.5 to send the command to Solr, wait for it to search and get the data back. The web server is located in the US and the Solr-machine is in Sweden (don't ask), so I can see it taking a while to send data back and forth, so getting the searches below 1.5s is not something I'm expecting. I just want to get away from the 5s searches. Is there a way of getting Solr to output the total time spent on any command? Just so I can eliminate some odd network problem/error. Also, have you done any profiling to see where the hotspots are? I have not. Not a Java person, so not sure how to do this. Is there something in the Solr admin that will allow me to do this? Have looked around and read what I could find in the Wiki, but didn't find anything that looked like profiling. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 7:26 PM, Lance Norskog wrote: One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Unfortunately the website is still very small, in terms of visitors. Was running MySQL, Apache and Solr on the same machine which only had 2 GB of RAM, so understandable if Solr throws an error or two at me. Cheers, Daniel
RE: Help optimizing
There are two integer types, 'sint' and 'integer'. On an integer, you cannot do a range check (that makes sense). But! Lucene sort makes an array of integers for every record. On an integer field, it creates an integer array. On any other kind of field, each array item has a lot more. So, if you want fast sorts with small memory footprint, you want 'integer' = 20070310, not 'sint' = 20070310. We did exactly this for exactly this reason. -Original Message- From: Daniel Andersson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 2:44 PM To: solr-user@lucene.apache.org Subject: Re: Help optimizing Thanks Otis! On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: You have a lot of fields of type text, but a number of field sound like they really need not be tokenized and should thus be of type string. I've changed quite a few of them over to string. Still not sure about the difference between 'string' and 'text' :-/ Do you really need 6 warming searchers? That I have no idea about. Currently it's a very small site, well, visitor-wise anyway. I think date type is pretty granular. Do you really need that type of precision? Probably not, have changed it to sint and will index the date in this format 20070310, which should do the trick. I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, or should it be lowercase 'm'? Append the letter k or K to indicate kilobytes or the letter m or M to indicate megabytes., so yeah, should recognize it. Have you noticed anything weird while looking at the Solr Java process with jConsole? I'm not very familiar with Java, so no idea what jConsole is :-/ Will be re-indexing tomorrow with the date-sint and text-string changes, will report back after it's done. Cheers, Daniel
Re: Help optimizing
Daniel - regarding query time - yes, look at the response (assuming you are using XML responses) and look for Qtime in the top part of the response. That's the number of milliseconds it took to execute the query. This time does not include the network time (request to Solr + time to send the whole response back to the client). US -- Sweden nice ;) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Daniel Andersson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 6:01:01 PM Subject: Re: Help optimizing On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote: On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: When performing a search, the results vary between 1.5 seconds up to 60 seconds. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. It's 1.5 to send the command to Solr, wait for it to search and get the data back. The web server is located in the US and the Solr-machine is in Sweden (don't ask), so I can see it taking a while to send data back and forth, so getting the searches below 1.5s is not something I'm expecting. I just want to get away from the 5s searches. Is there a way of getting Solr to output the total time spent on any command? Just so I can eliminate some odd network problem/error. Also, have you done any profiling to see where the hotspots are? I have not. Not a Java person, so not sure how to do this. Is there something in the Solr admin that will allow me to do this? Have looked around and read what I could find in the Wiki, but didn't find anything that looked like profiling. Cheers, Daniel
Re: Help optimizing
Daniel, The main difference is that string type fields are not tokenized, while text type fields are. Example: input text: milk with honey is god String fields will end up with a single token: milk with honey is god Text fields will end up with 5 tokens (assuming no stop word filtering): milk, with, honey, is, god Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Daniel Andersson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 5:43:44 PM Subject: Re: Help optimizing Thanks Otis! On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: You have a lot of fields of type text, but a number of field sound like they really need not be tokenized and should thus be of type string. I've changed quite a few of them over to string. Still not sure about the difference between 'string' and 'text' :-/ Do you really need 6 warming searchers? That I have no idea about. Currently it's a very small site, well, visitor-wise anyway. I think date type is pretty granular. Do you really need that type of precision? Probably not, have changed it to sint and will index the date in this format 20070310, which should do the trick. I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, or should it be lowercase 'm'? Append the letter k or K to indicate kilobytes or the letter m or M to indicate megabytes., so yeah, should recognize it. Have you noticed anything weird while looking at the Solr Java process with jConsole? I'm not very familiar with Java, so no idea what jConsole is :-/ Will be re-indexing tomorrow with the date-sint and text-string changes, will report back after it's done. Cheers, Daniel
Re: Help optimizing
On 3-May-08, at 10:06 AM, Daniel Andersson wrote: Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. That is too long, and shouldn't be happening. How do I optimize Solr to better use all the RAM? I'm using java6, 64bit version, and start Solr using: java -Xmx7500M -Xms4096M -jar start.jar But according to top it only seems to be using 7.7% of the memory (around 600 MB). Don't try to give Solr _all_ the memory on the system. Solr depends on the index existing in the OS's disk cache (this is cached in top). You should have at least 2 GB memory for a 3.5GB index, depending on how much of the index is stored (best is of course to have 3.5GB available so it can be cached completely). Solr will require a wide distribution of queries to warm up (get the index in the OS disk cache). This is automatically prioritize the hot spots in the index. If you want to load the whole thing 'cd datadir; cat * /dev/null' works, but I don't recommend relying on that. Most queries are for make_id + model_id or city + state and almost all of the queries are ordered by datetime_found (newest - oldest). How many documents match, typically? How many documents are returned, typically? How often do you commit() [I suspect frequently, based on the problems you are having]? -Mike
Re: Help optimizing
Daniel, 1.5 seconds is quite long and 60 seconds ismore than scary - smells like some is not set up right. You have a lot of fields of type text, but a number of field sound like they really need not be tokenized and should thus be of type string. Do you really need 6 warming searchers? I think date type is pretty granular. Do you really need that type of precision? I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, or should it be lowercase 'm'? Have you noticed anything weird while looking at the Solr Java process with jConsole? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Daniel Andersson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Saturday, May 3, 2008 7:06:10 PM Subject: Help optimizing Hi (again) people We've now invested in a server with 8 GB of RAM after too many OutOfMemory-errors. Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. How do I optimize Solr to better use all the RAM? I'm using java6, 64bit version, and start Solr using: java -Xmx7500M -Xms4096M -jar start.jar But according to top it only seems to be using 7.7% of the memory (around 600 MB). Most queries are for make_id + model_id or city + state and almost all of the queries are ordered by datetime_found (newest - oldest). from schema.xml: required=true / multiValued=false / multiValued=false / multiValues=true / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / multiValued=false / stored=true multiValued=false / multiValued=true/ from solrconfig.xml: class=solr.LRUCache size=4096 initialSize=4096 autowarmCount=2048/ class=solr.LRUCache size=4096 initialSize=4096 autowarmCount=2048/ class=solr.LRUCache size=4096 initialSize=4096 autowarmCount=0/ 50 1000 datetime_found 0 str 10 datetime_found 0 str 10 true 6 Any help much appreciated! Kind regards, Daniel