Re: Help optimizing

2008-05-06 Thread Grant Ingersoll


On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:


Hi (again) people

We've now invested in a server with 8 GB of RAM after too many  
OutOfMemory-errors.


Our database/index is 3.5 GB and contains 4,352,471 documents. Most  
documents are less than 1 kb. When performing a search, the results  
vary between 1.5 seconds up to 60 seconds.


I don't have a big problem with 1.5 seconds (even though below 1  
would be nice), but 60 seconds it just.. well, scary.


Is this pure Solr time or overall application time?  I ask, b/c it is  
often the case that people are measuring application time and the  
problem lies in the application, so I just want to clarify.


Also, have you done any profiling to see where the hotspots are?

-Grant


RE: Help optimizing

2008-05-06 Thread Lance Norskog
One cause of out-of-memory is multiple simultaneous requests. If you limit
the query stream to one or two simultaneous requests, you might fix this.
No, Solr does not have an option for this. The servlet containers have
controls for this that you have to dig very deep to find.

Lance Norskog 

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 06, 2008 5:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Help optimizing


On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:

 Hi (again) people

 We've now invested in a server with 8 GB of RAM after too many 
 OutOfMemory-errors.

 Our database/index is 3.5 GB and contains 4,352,471 documents. Most 
 documents are less than 1 kb. When performing a search, the results 
 vary between 1.5 seconds up to 60 seconds.

 I don't have a big problem with 1.5 seconds (even though below 1 would 
 be nice), but 60 seconds it just.. well, scary.

Is this pure Solr time or overall application time?  I ask, b/c it is often
the case that people are measuring application time and the problem lies in
the application, so I just want to clarify.

Also, have you done any profiling to see where the hotspots are?

-Grant



Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic
Hello,

If you are using Jetty, you don't have to dig very deep - just look for the 
section about threads.  Here is a snippet from Jetty 6.1.9's jetty.xml:


Set name=ThreadPool
  !-- Default bounded blocking threadpool 
  --
  New class=org.mortbay.thread.BoundedThreadPool
Set name=minThreads10/Set
Set name=maxThreads50/Set
Set name=lowThreads25/Set
  /New

  !-- New queued blocking threadpool : better scalability
  New class=org.mortbay.thread.QueuedThreadPool
Set name=minThreads10/Set
Set name=maxThreads25/Set
Set name=lowThreads5/Set
Set name=SpawnOrShrinkAt2/Set
  /New
  --

  !-- Optional Java 5 bounded threadpool with job queue 
  New class=org.mortbay.thread.concurrent.ThreadPool
Set name=corePoolSize50/Set
Set name=maximumPoolSize50/Set
  /New
  --
/Set


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
 From: Lance Norskog [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Cc: Norskog, Lance [EMAIL PROTECTED]
 Sent: Tuesday, May 6, 2008 1:26:28 PM
 Subject: RE: Help optimizing
 
 One cause of out-of-memory is multiple simultaneous requests. If you limit
 the query stream to one or two simultaneous requests, you might fix this.
 No, Solr does not have an option for this. The servlet containers have
 controls for this that you have to dig very deep to find.
 
 Lance Norskog 
 
 -Original Message-
 From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 06, 2008 5:19 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Help optimizing
 
 
 On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:
 
  Hi (again) people
 
  We've now invested in a server with 8 GB of RAM after too many 
  OutOfMemory-errors.
 
  Our database/index is 3.5 GB and contains 4,352,471 documents. Most 
  documents are less than 1 kb. When performing a search, the results 
  vary between 1.5 seconds up to 60 seconds.
 
  I don't have a big problem with 1.5 seconds (even though below 1 would 
  be nice), but 60 seconds it just.. well, scary.
 
 Is this pure Solr time or overall application time?  I ask, b/c it is often
 the case that people are measuring application time and the problem lies in
 the application, so I just want to clarify.
 
 Also, have you done any profiling to see where the hotspots are?
 
 -Grant
 
 




Re: Help optimizing

2008-05-06 Thread Daniel Andersson

Thanks Otis!


On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:

You have a lot of fields of type text, but a number of field sound  
like they really need not be tokenized and should thus be of type  
string.


I've changed quite a few of them over to string. Still not sure about  
the difference between 'string' and 'text' :-/




Do you really need 6 warming searchers?


That I have no idea about. Currently it's a very small site, well,  
visitor-wise anyway.



I think date type is pretty granular.  Do you really need that  
type of precision?


Probably not, have changed it to sint and will index the date in this  
format 20070310, which should do the trick.



I don't have shell handy here to check, but is that 'M' in -Xmx...  
recognized, or should it be lowercase 'm'?


Append the letter k  or K to indicate kilobytes or the letter m or M  
to indicate megabytes., so yeah, should recognize it.



Have you noticed anything weird while looking at the Solr Java  
process with jConsole?


I'm not very familiar with Java, so no idea what jConsole is :-/


Will be re-indexing tomorrow with the date-sint and text-string  
changes, will report back after it's done.


Cheers,
Daniel


Re: Help optimizing

2008-05-06 Thread Daniel Andersson


On May 6, 2008, at 4:00 AM, Mike Klaas wrote:


On 3-May-08, at 10:06 AM, Daniel Andersson wrote:


How do I optimize Solr to better use all the RAM? I'm using java6,  
64bit version, and start Solr using:

java -Xmx7500M -Xms4096M -jar start.jar

But according to top it only seems to be using 7.7% of the memory  
(around 600 MB).


Don't try to give Solr _all_ the memory on the system.  Solr depends  
on the index existing in the OS's disk cache (this is cached in  
top).  You should have at least 2 GB memory for a 3.5GB index,  
depending on how much of the index is stored (best is of course to  
have 3.5GB available so it can be cached completely).


Solr will require a wide distribution of queries to warm up (get  
the index in the OS disk cache).   This is automatically prioritize  
the hot spots in the index.  If you want to load the whole thing  
'cd datadir; cat *  /dev/null' works, but I don't recommend relying  
on that.


Ah. Have given it 4 GB of RAM now (Xmx=4 GB, Xms=2 GB)


Most queries are for make_id + model_id or city + state and almost  
all of the queries are ordered by datetime_found (newest - oldest).


How many documents match, typically?  How many documents are  
returned, typically?  How often do you commit() [I suspect  
frequently, based on the problems you are having]?



Average documents matched/found: 6427
Only return 10 documents per page

Commit every 10,000 documents. Tried it at 100,000 with 2 GB of ram (1  
GB dedicated to Solr) and it just gave me OutOfMemory every time.  
Haven't tried increasing it since moving it to this new server.


Cheers,
Daniel


Re: Help optimizing

2008-05-06 Thread Daniel Andersson


On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote:


On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:

When performing a search, the results vary between 1.5 seconds up  
to 60 seconds.


Is this pure Solr time or overall application time?  I ask, b/c it  
is often the case that people are measuring application time and the  
problem lies in the application, so I just want to clarify.


It's 1.5 to send the command to Solr, wait for it to search and get  
the data back.


The web server is located in the US and the Solr-machine is in Sweden  
(don't ask), so I can see it taking a while to send data back and  
forth, so getting the searches below 1.5s is not something I'm  
expecting. I just want to get away from the 5s searches.


Is there a way of getting Solr to output the total time spent on any  
command? Just so I can eliminate some odd network problem/error.




Also, have you done any profiling to see where the hotspots are?


I have not. Not a Java person, so not sure how to do this. Is there  
something in the Solr admin that will allow me to do this? Have looked  
around and read what I could find in the Wiki, but didn't find  
anything that looked like profiling.


Cheers,
Daniel


Re: Help optimizing

2008-05-06 Thread Daniel Andersson


On May 6, 2008, at 7:26 PM, Lance Norskog wrote:

One cause of out-of-memory is multiple simultaneous requests. If you  
limit
the query stream to one or two simultaneous requests, you might fix  
this.

No, Solr does not have an option for this. The servlet containers have
controls for this that you have to dig very deep to find.


Unfortunately the website is still very small, in terms of visitors.
Was running MySQL, Apache and Solr on the same machine which only had  
2 GB of RAM, so understandable if Solr throws an error or two at me.


Cheers,
Daniel


RE: Help optimizing

2008-05-06 Thread Lance Norskog
There are two integer types, 'sint' and 'integer'. On an integer, you cannot
do a range check (that makes sense).
But!  Lucene sort makes an array of integers for every record. On an integer
field, it creates an integer array. On any other kind of field, each array
item has a lot more.

So, if you want fast sorts with small memory footprint, you want 'integer' =
20070310, not 'sint' = 20070310.  We did exactly this for exactly this
reason.

-Original Message-
From: Daniel Andersson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 06, 2008 2:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Help optimizing

Thanks Otis!


On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:

 You have a lot of fields of type text, but a number of field sound 
 like they really need not be tokenized and should thus be of type 
 string.

I've changed quite a few of them over to string. Still not sure about the
difference between 'string' and 'text' :-/


 Do you really need 6 warming searchers?

That I have no idea about. Currently it's a very small site, well,
visitor-wise anyway.


 I think date type is pretty granular.  Do you really need that type 
 of precision?

Probably not, have changed it to sint and will index the date in this format
20070310, which should do the trick.


 I don't have shell handy here to check, but is that 'M' in -Xmx...  
 recognized, or should it be lowercase 'm'?

Append the letter k  or K to indicate kilobytes or the letter m or M to
indicate megabytes., so yeah, should recognize it.


 Have you noticed anything weird while looking at the Solr Java process 
 with jConsole?

I'm not very familiar with Java, so no idea what jConsole is :-/


Will be re-indexing tomorrow with the date-sint and text-string changes,
will report back after it's done.

Cheers,
Daniel



Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic
Daniel - regarding query time - yes, look at the response (assuming you are 
using XML responses) and look for Qtime in the top part of the response.  
That's the number of milliseconds it took to execute the query.  This time does 
not include the network time (request to Solr + time to send the whole response 
back to the client).

US -- Sweden nice ;)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
 From: Daniel Andersson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 6, 2008 6:01:01 PM
 Subject: Re: Help optimizing
 
 
 On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote:
 
  On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:
 
  When performing a search, the results vary between 1.5 seconds up  
  to 60 seconds.
 
  Is this pure Solr time or overall application time?  I ask, b/c it  
  is often the case that people are measuring application time and the  
  problem lies in the application, so I just want to clarify.
 
 It's 1.5 to send the command to Solr, wait for it to search and get  
 the data back.
 
 The web server is located in the US and the Solr-machine is in Sweden  
 (don't ask), so I can see it taking a while to send data back and  
 forth, so getting the searches below 1.5s is not something I'm  
 expecting. I just want to get away from the 5s searches.
 
 Is there a way of getting Solr to output the total time spent on any  
 command? Just so I can eliminate some odd network problem/error.
 
 
  Also, have you done any profiling to see where the hotspots are?
 
 I have not. Not a Java person, so not sure how to do this. Is there  
 something in the Solr admin that will allow me to do this? Have looked  
 around and read what I could find in the Wiki, but didn't find  
 anything that looked like profiling.
 
 Cheers,
 Daniel
 




Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic
Daniel,

The main difference is that string type fields are not tokenized, while text 
type fields are.
Example:
input text: milk with honey is god
String fields will end up with a single token: milk with honey is god
Text fields will end up with 5 tokens (assuming no stop word filtering): 
milk, with, honey, is, god

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
 From: Daniel Andersson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 6, 2008 5:43:44 PM
 Subject: Re: Help optimizing
 
 Thanks Otis!
 
 
 On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:
 
  You have a lot of fields of type text, but a number of field sound  
  like they really need not be tokenized and should thus be of type  
  string.
 
 I've changed quite a few of them over to string. Still not sure about  
 the difference between 'string' and 'text' :-/
 
 
  Do you really need 6 warming searchers?
 
 That I have no idea about. Currently it's a very small site, well,  
 visitor-wise anyway.
 
 
  I think date type is pretty granular.  Do you really need that  
  type of precision?
 
 Probably not, have changed it to sint and will index the date in this  
 format 20070310, which should do the trick.
 
 
  I don't have shell handy here to check, but is that 'M' in -Xmx...  
  recognized, or should it be lowercase 'm'?
 
 Append the letter k  or K to indicate kilobytes or the letter m or M  
 to indicate megabytes., so yeah, should recognize it.
 
 
  Have you noticed anything weird while looking at the Solr Java  
  process with jConsole?
 
 I'm not very familiar with Java, so no idea what jConsole is :-/
 
 
 Will be re-indexing tomorrow with the date-sint and text-string  
 changes, will report back after it's done.
 
 Cheers,
 Daniel
 




Re: Help optimizing

2008-05-05 Thread Mike Klaas

On 3-May-08, at 10:06 AM, Daniel Andersson wrote:


Our database/index is 3.5 GB and contains 4,352,471 documents. Most  
documents are less than 1 kb. When performing a search, the results  
vary between 1.5 seconds up to 60 seconds.


I don't have a big problem with 1.5 seconds (even though below 1  
would be nice), but 60 seconds it just.. well, scary.


That is too long, and shouldn't be happening.



How do I optimize Solr to better use all the RAM? I'm using java6,  
64bit version, and start Solr using:

java -Xmx7500M -Xms4096M -jar start.jar

But according to top it only seems to be using 7.7% of the memory  
(around 600 MB).


Don't try to give Solr _all_ the memory on the system.  Solr depends  
on the index existing in the OS's disk cache (this is cached in  
top).  You should have at least 2 GB memory for a 3.5GB index,  
depending on how much of the index is stored (best is of course to  
have 3.5GB available so it can be cached completely).


Solr will require a wide distribution of queries to warm up (get the  
index in the OS disk cache).   This is automatically prioritize the  
hot spots in the index.  If you want to load the whole thing 'cd  
datadir; cat *  /dev/null' works, but I don't recommend relying on  
that.


Most queries are for make_id + model_id or city + state and almost  
all of the queries are ordered by datetime_found (newest - oldest).


How many documents match, typically?  How many documents are returned,  
typically?  How often do you commit() [I suspect frequently, based on  
the problems you are having]?


-Mike



Re: Help optimizing

2008-05-03 Thread Otis Gospodnetic
Daniel,

1.5 seconds is quite long and 60 seconds ismore than scary - smells like 
some is not set up right.
You have a lot of fields of type text, but a number of field sound like they 
really need not be tokenized and should thus be of type string.
Do you really need 6 warming searchers?
I think date type is pretty granular.  Do you really need that type of 
precision?
I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, 
or should it be lowercase 'm'?
Have you noticed anything weird while looking at the Solr Java process with 
jConsole?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
 From: Daniel Andersson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, May 3, 2008 7:06:10 PM
 Subject: Help optimizing
 
 Hi (again) people
 
 We've now invested in a server with 8 GB of RAM after too many  
 OutOfMemory-errors.
 
 Our database/index is 3.5 GB and contains 4,352,471 documents. Most  
 documents are less than 1 kb. When performing a search, the results  
 vary between 1.5 seconds up to 60 seconds.
 
 I don't have a big problem with 1.5 seconds (even though below 1 would  
 be nice), but 60 seconds it just.. well, scary.
 
 How do I optimize Solr to better use all the RAM? I'm using java6,  
 64bit version, and start Solr using:
 java -Xmx7500M -Xms4096M -jar start.jar
 
 But according to top it only seems to be using 7.7% of the memory  
 (around 600 MB).
 
 Most queries are for make_id + model_id or city + state and almost all  
 of the queries are ordered by datetime_found (newest - oldest).
 
 from schema.xml:
 
 required=true /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValues=true /
 
 multiValued=false /
 
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 multiValued=false /
 
 
 multiValued=false /
 
 multiValued=false /
 
 
 
 stored=true multiValued=false /
 
 
 
 multiValued=true/
 
 
 
 
 
 
 from solrconfig.xml:
  
class=solr.LRUCache
size=4096
initialSize=4096
autowarmCount=2048/
 
  
class=solr.LRUCache
size=4096
initialSize=4096
autowarmCount=2048/
 
  
class=solr.LRUCache
size=4096
initialSize=4096
autowarmCount=0/
 
  50
 
  1000
 
  
  

   datetime_found 0 str 10 

  
 
  

   datetime_found 0 str 10 

  
 
  true
  6
 
 Any help much appreciated!
 
 Kind regards,
 Daniel