How to handle large field values.

2008-11-05 Thread Luca Molteni
Hello everybody, dealing with very large fields, let's say text documents, I found that there is a global slowness (on my computer) in returning those field. Since most of the time what we want is an highlight value of the field and not the entire field, I thought that we can omit these field

Re: Need to write a start.jar file

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
can you tell what exactly you wish to customize? On Wed, Nov 5, 2008 at 10:46 AM, Muhammed Sameer [EMAIL PROTECTED] wrote: Salaam, I read somewhere that it is better to write a new start.jar file than use the one that is provided within the example directory, can someone please guide me

ndis push search results to top

2008-11-05 Thread Simon Collins
Hi Is there an easy way (bearing in mind i'm still very new to this solr lark) to push certain items to the top of search results. For instance, if customers are searching for boots on our site, i might want to push up higher margin products to the top of the results, or push popular items

Re: How to handle large field values.

2008-11-05 Thread Luca Molteni
This worked, thank you very much. Any idea on how I can help documenting it? Can I write in the wiki? maybe in http://wiki.apache.org/solr/SolrConfigXml#head-13e17f74dde0751b8a7cfe539f631d58029b8080 L.M. 2008/11/5 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED]: the fl must have the unique id

Re: How to handle large field values.

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
the fl must have the unique id field also. because if fl is mentioned it returns only the mentioned one On Wed, Nov 5, 2008 at 4:36 PM, Luca Molteni [EMAIL PROTECTED] wrote: Uhm, this works great when using only one server, because I can specify the fields in the configuration file, but It

Re: Need to write a start.jar file

2008-11-05 Thread Muhammed Sameer
Salaam, Thanks for the response, I'll only change this if I need any customization done Regards, Muhammed Sameer --- On Wed, 11/5/08, Erik Hatcher [EMAIL PROTECTED] wrote: From: Erik Hatcher [EMAIL PROTECTED] Subject: Re: Need to write a start.jar file To: solr-user@lucene.apache.org Date:

Trying to run solr-1.3.0 under tomcat 5.5.20 on OS X 10.5.5

2008-11-05 Thread Fergus McMenemie
Hello all, I downloaded everything and set it up as per the instructions, and while it does run under jetty, I can not get it to start under tomcat at all. I get the following errors. This is with solrconfig.xml straight from the tgz file. HTTP Status 500 - Severe errors in solr

Re: Need to write a start.jar file

2008-11-05 Thread Erik Hatcher
I've never heard of this need to provide a customized start.jar. Could you send us a pointer to where you read that if you still have that available? But, no, there is no need to provide a different start.jar. However, Jetty is really just one example of how you deploy Solr - any modern

Re: How to handle large field values.

2008-11-05 Thread Luca Molteni
Uhm, this works great when using only one server, because I can specify the fields in the configuration file, but It gives me a nice nullpointer exception when using distributed shards: HTTP Status 500 - null java.lang.NullPointerException at

Re: Throughput Optimization

2008-11-05 Thread Erik Hatcher
One quick question are you seeing any evictions from your filterCache? If so, it isn't set large enough to handle the faceting you're doing. Erik On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: I've been running load tests over the past week or 2, and I can't figure out my

Re: How to handle large field values.

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
the 'fl' parameter can be added to the defaults for your search handler in solrconfig.xml On Wed, Nov 5, 2008 at 3:22 PM, Luca Molteni [EMAIL PROTECTED] wrote: Hello everybody, dealing with very large fields, let's say text documents, I found that there is a global slowness (on my computer)

Re: Throughput Optimization

2008-11-05 Thread Yonik Seeley
You're probably hitting some contention with the locking around the reading of index files... this has been recently improved in Lucene for non-Windows boxes, and we're integrating that into Solr (should def be in the next release). -Yonik On Tue, Nov 4, 2008 at 9:01 PM, wojtekpia [EMAIL

Fwd: [Solr Wiki] Update of SolrResources by GrantIngersoll

2008-11-05 Thread Shalin Shekhar Mangar
Thank you Grant, very nicely written! http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01S_CMP=HP -- Forwarded message -- From: Apache Wiki [EMAIL PROTECTED] Date: Wed, Nov 5, 2008 at 7:25 PM Subject: [Solr Wiki] Update of SolrResources by GrantIngersoll

Re: Throughput Optimization

2008-11-05 Thread Mark Miller
The latest alt directory patch uses It. - Mark On Nov 5, 2008, at 9:25 AM, Yonik Seeley [EMAIL PROTECTED] wrote: You're probably hitting some contention with the locking around the reading of index files... this has been recently improved in Lucene for non-Windows boxes, and we're

Large Data Set Suggestions

2008-11-05 Thread Steven Anderson
Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ. Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump reference in the DIH wiki, but

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory exceptions. My filterCache hit ratio is .99. It looks like I've hit a RAM bound here. I ran a test without faceting. The response times / throughput were both significantly higher, there

Re: Large Data Set Suggestions

2008-11-05 Thread Fergus McMenemie
Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ. Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump reference in the DIH wiki, but

RE: Retrieving a non-indexed but stored field

2008-11-05 Thread Andrew Nagy
Sorry for the late follow-up. I am doing this, but get nothing back. Can anyone replicate this problem? Andrew From: Erik Hatcher [EMAIL PROTECTED] Sent: Tuesday, October 14, 2008 12:36 PM To: solr-user@lucene.apache.org Subject: Re: Retrieving a

Re: Throughput Optimization

2008-11-05 Thread christophe
Does the number of searcher affect CPU usage ? Not totally sure about it but I think some versions of Tomcat were not totally scalable over 4 CPUs (or 4 cores). C. wojtekpia wrote: Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
If you are seeing 90% CPU usage and are not IO (File or Network) bound, then you are most probably bound by lock contention. If your CPU usage goes down as you throw more threads at the box, that's an even bigger indication that that is the issue. A good profiling tool should help you locate

Re: Throughput Optimization

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 11:14 AM, wojtekpia [EMAIL PROTECTED] wrote: Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory exceptions. My filterCache hit ratio is .99. It looks like I've hit a RAM bound here. Evictions on the filterCache

RE: Retrieving a non-indexed but stored field

2008-11-05 Thread Andrew Nagy
Sorry for the late follow-up. I am doing this, but get nothing back. Can anyone replicate this problem? Andrew From: Erik Hatcher [EMAIL PROTECTED] Sent: Tuesday, October 14, 2008 12:36 PM To: solr-user@lucene.apache.org Subject: Re: Retrieving a

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
What are your other cache hit rates looking like? Which caches are you using the FastLRUCache on? -Todd Feak -Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 8:15 AM To: solr-user@lucene.apache.org Subject: Re: Throughput Optimization

Need help with Parsing user Query

2008-11-05 Thread Rajiv2
Hi, I need help with solving a particular problem I'm having. I have a one box search website where users can type cosmetic surgery houston tx or cosmetic surgery 22151. I need to come up with a reliable way to parse out the geo terms/or zipcodes from the user query so that I can submit a query

question about Solr directories on mounted file systems

2008-11-05 Thread Jim Adams
I have an application that is using SOLR on a mounted file system. However, machine or human error can sometimes unmount the file system. This causes Solr to write index files to a different area from the index I am using. This also means that the index instance becomes corrupt, because some

Re: question about Solr directories on mounted file systems

2008-11-05 Thread Walter Underwood
I do not recommend using Lucene or Solr on a mounted file system. My implementation was 100X faster after I moved it from NFS to local disk. --wunder On 11/5/08 10:01 AM, Jim Adams [EMAIL PROTECTED] wrote: I have an application that is using SOLR on a mounted file system. However, machine or

Regex Transformer Error

2008-11-05 Thread Ahmed Hammad
Hi, I am using Solr 1.3 data import handler. One of my table fields has html tags, I want to strip it of the field text. So obviously I need the Regex Transformer. I added transformer=RegexTransformer attribute to my entity and a new field with: field sourceColName=content column=content

Re: Retrieving a non-indexed but stored field

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 11:47 AM, Andrew Nagy [EMAIL PROTECTED] wrote: Sorry for the late follow-up. I am doing this, but get nothing back. Did you change the field to stored in the schema after you added the document? I've never seen anyone having this problem, so perhaps verify that you are

Re: Regex Transformer Error

2008-11-05 Thread Ahmed Hammad
Hi, It works with the attribute regex=lt;(.|\n)*?gt; Sorry for the disturbance. Regards, ahmd On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi, I am using Solr 1.3 data import handler. One of my table fields has html tags, I want to strip it of the field text. So

Re: Redirecting output of post.jar and start.jar

2008-11-05 Thread Ryan McKinley
On Nov 5, 2008, at 7:30 AM, Muhammed Sameer wrote: Salaam, When I run post.jar or start.jar its throws a lot of information on the screen, I even tried redirecting the info but that does not seem to help, I have configured a cron to run post.jar to run every 2mins to keep the index

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
Where is the alt directory in the source tree (or what is the JIRA issue number)? I'd like to apply this patch and re-run my tests. Does changing the lockType in solrconfig.xml address this issue? (My lockType is the default - single). markrmiller wrote: The latest alt directory patch uses

RE: Throughput Optimization

2008-11-05 Thread wojtekpia
My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using FastLRUCache on all 3 of the caches. Feak, Todd wrote: What are your other cache hit rates looking like? Which caches are you using the FastLRUCache on? -Todd Feak -Original Message- From: wojtekpia

RE: Retrieving a non-indexed but stored field

2008-11-05 Thread Andrew Nagy
Nope - I made the schema change and then indexed all of my content. I can confirm that the URL string is included, cause when I change my schema back to have both stored and indexed, it shows the URL data in the search results. When I change it to stored and not indexed, no data is returned.

Re: Retrieving a non-indexed but stored field

2008-11-05 Thread Erick Erickson
What's the query you're hitting SOLR with? If it's on the URL field, that would match your behavior I.e. if you're getting results based upon whether you index the field or not, it would be neatly explained by whether you're *searching* on that field. Best [EMAIL PROTECTED] P.S. Luke might

Re: Retrieving a non-indexed but stored field

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 2:09 PM, Andrew Nagy [EMAIL PROTECTED] wrote: Nope - I made the schema change and then indexed all of my content. I can confirm that the URL string is included, cause when I change my schema back to have both stored and indexed, it shows the URL data in the search

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
Yonik said something about the FastLRUCache giving the most gain for high hit-rates and the LRUCache being faster for low hit-rates. It's in his Nov 1 comment on SOLR-667. I'm not sure if anything changed since then, as it's an active issue, but you may want to try the LRUCache for your query

Re: Throughput Optimization

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 2:44 PM, wojtekpia [EMAIL PROTECTED] wrote: I'll try changing my other caches to LRUCache and observe performance. Interestingly, the FastLRUCache has given me a ~10% increase in performance, much lower than I've read on the SOLR-667 thread. That's better than I would

Highlighting Oddities

2008-11-05 Thread Chris Harris
I'm testing out the default (gap) fragmenter with some simple, single-word queries on a patched 1.3.0 release populated with some real-world data. (I think the primary quirk in my setup is that I'm using ShingleFilterFactory to put word bigrams (aka shingles) into my index. I was worried that this

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
I'd like to integrate this improvement into my deployment. Is it just a matter of getting the latest Lucene jars (Lucene nightly build)? Yonik Seeley wrote: You're probably hitting some contention with the locking around the reading of index files... this has been recently improved in

Re: Throughput Optimization

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 5:18 PM, wojtekpia [EMAIL PROTECTED] wrote: I'd like to integrate this improvement into my deployment. Is it just a matter of getting the latest Lucene jars (Lucene nightly build)? You need to apply this source code patch to Solr:

Re: Large Data Set Suggestions

2008-11-05 Thread souravm
Hi Fergus, Does the 6.6m doc resides on a single box (node) or multiple boxes ? Do u use distributed search ? Regards, Sourav - Original Message - From: Fergus McMenemie [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed Nov 05 08:21:45 2008

Re: How to use multicore feature in JBOSS

2008-11-05 Thread Norberto Meijome
On Tue, 4 Nov 2008 23:45:40 -0800 (PST) con [EMAIL PROTECTED] wrote: But for the first question, I am still not clear. I think to use the multicore feature we should inform the server. In the Jetty server, we are starting the server using: java -Dsolr.solr.home=multicore -jar start.jar

RE: Regex Transformer Error

2008-11-05 Thread Norskog, Lance
There is a nice HTML stripper inside Solr. solr.HTMLStripStandardTokenizerFactory -Original Message- From: Ahmed Hammad [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 10:43 AM To: solr-user@lucene.apache.org Subject: Re: Regex Transformer Error Hi, It works with the

Bias score proximity for a given field

2008-11-05 Thread Nguyen, Joe
Hi Is there a way to specify a range boosting for a numeric/date field? Suppose I have articles whose published dates are in 2005,...,2008,...,2011. I want to boost the score of 2008 article by 20%. Articles whose published dates 3-year distance from 2008 article would be boosted by 0%, e.g.

Re: Regex Transformer Error

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you try w/o escaping the '' characters? On Wed, Nov 5, 2008 at 11:48 PM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi, I am using Solr 1.3 data import handler. One of my table fields has html tags, I want to strip it of the field text. So obviously I need the Regex Transformer. I added

Re: Large Data Set Suggestions

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
The performance of DIH is likely to be faster than SolrJ. Because , it does not have the overhead of an http request. What is your data source? I am assuming it is xml. SolrJ cannot directly index xml . You may need to read docs from xml before solrj can index it. --Noble On Wed, Nov 5, 2008

Search in SOLR multi cores in a single request

2008-11-05 Thread gurudev
I have been reading the SOLR 1.3 wiki, which says that to fetch documents from each cores in a multi-cores setup we need to request each core independently. What i was under impression that SOLR multi-core feature might be using lucene's multisearcher to search among multiple cores. Anyone with

Re: Search in SOLR multi cores in a single request

2008-11-05 Thread Shalin Shekhar Mangar
The idea behind multicore is that you will use them if you have completely different type of documents (basically multiple schemas). You might want to look at Distributed Search which allows for sharding of the data on multiple servers and searching them all.