Re: Nutch with SOLR

2007-09-26 Thread Doğacan Güney
On 9/26/07, Brian Whitman [EMAIL PROTECTED] wrote: Sami has a patch in there which used a older version of the solr client. with the current solr client in the SVN tree, his patch becomes much easier. your job would be to upgrade the patch and mail it back to him so he can update his

custom sorting

2007-09-26 Thread Sandeep Shetty
Hi Guys, this question as been asked before but i was unable to find an answer thats good for me, so hope you guys can help again i am working on a website where we need to sort the results by distance from the location entered by the user. I have indexed the lat and long info for each

Result grouping options

2007-09-26 Thread Thomas
Hello, For the project I'm working on now it is important to group the results of a query by a product field. Documents belong to only one product and there will never be more than 10 different products alltogether. When searching through the archives I identified 3 options: 1)

[JOB] Full-time opportunity in Paris, France

2007-09-26 Thread nicolas . dessaigne
Arisem is a French ISV delivering best-of-breed text analytics software. We are using Lucene in our products since 2001 and are in search of a Lucene expert to complement our RD team. Required skills: - Master degree in computer science - 2+ years of experience in working with Lucene -

Re: Nutch with SOLR

2007-09-26 Thread Brian Whitman
On Sep 26, 2007, at 4:04 AM, Doğacan Güney wrote: NUTCH-442 is one of the issues that I want to really see resolved. Unfortunately, I haven't received many (as in, none) comments, so I haven't made further progress on it. I am probably your target customer but to be honest all we care about

dataset parameters suitable for lucene application

2007-09-26 Thread Law, John
I am new to the list and new to lucene and solr. I am considering Lucene for a potential new application and need to know how well it scales. Following are the parameters of the dataset. Number of records: 7+ million Database size: 13.3 GB Index Size: 10.9 GB My questions are simply: 1)

Re: dataset parameters suitable for lucene application

2007-09-26 Thread Walter Underwood
That seems well within Solr's capabilities, though you should come up with a desired queries/sec figure. Solr's query rate varies widely with the configuration -- how many fields, fuzzy search, highlighting, facets, etc. Essentially, Solr uses Lucene, a modern search core. It has performance and

2 indexes

2007-09-26 Thread philguillard
Hi, I'm new to solr, sorry if i missed my answer in the docs somewhere... I need 2 different solr indexes. Should i create 2 webapps? In that case i have tomcat contexts solr and solr2, then i can't start solr2, i get this error: Sep 26, 2007 6:07:25 PM

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Charlie Jackson
My experiences so far with this level of data have been good. Number of records: Maxed out at 8.8 million Database size: friggin huge (100+ GB) Index size: ~24 GB 1) It took me about a day to index 8 million docs using a non-optimized program I wrote. It's non-optimized in the sense that it's

searching for non-empty fields

2007-09-26 Thread Brian Whitman
I have a large index with a field for a URL. For some reason or another, sometimes a doc will get indexed with that field blank. This is fine but I want a query to return only the set URL fields... If I do a query like: q=URL:[* TO *] I get a lot of empty fields back, like: docstr

Re: dataset parameters suitable for lucene application

2007-09-26 Thread Chris Harris
By maxed out do you mean that Solr's performance became unacceptable beyond 8.8M records, or that you only had 8.8M records to index? If the former, can you share the particular symptoms? On 9/26/07, Charlie Jackson [EMAIL PROTECTED] wrote: My experiences so far with this level of data have been

How to get debug information while indexing?

2007-09-26 Thread Urvashi Gadi
Hi, I am trying to create my own application using SOLR and while trying to index my data i get Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update or Server returned HTTP response code: 500 for URL: http://localhost:8983/solr/update Is there a way to get more

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Charlie Jackson
Sorry, I meant that it maxed out in the sense that my maxDoc field on the stats page was 8.8 million, which indicates that the most docs it has ever had was around 8.8 million. It's down to about 7.8 million currently. I have seen no signs of a maximum number of docs Solr can handle.

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Law, John
Thanks all! One last question... If I had a collection of 2.5 billion docs and a demand averaging 200 queries per second, what's the confidence that Solr/Lucene could handle this volume and execute search with sub-second response times? -Original Message- From: Charlie Jackson

Re: dataset parameters suitable for lucene application

2007-09-26 Thread Walter Underwood
No one can answer that, because it depends on how you configure Solr. How many fields do you want to search? Are you using fuzzy search? Facets? Highlighting? We are searching a much smaller collection, about 250K docs, with great success. We see 80 queries/sec on each of four servers, and

Re: How to get debug information while indexing?

2007-09-26 Thread Yonik Seeley
On 9/26/07, Urvashi Gadi [EMAIL PROTECTED] wrote: Hi, I am trying to create my own application using SOLR and while trying to index my data i get Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update or Server returned HTTP response code: 500 for URL:

Geographical distance searching

2007-09-26 Thread Lance Norskog
It is a best practice to store the master copy of this data in a relational database and use Solr/Lucene as a high-speed cache. MySQL has a geographical database option, so maybe that is a better option than Lucene indexing. Lance (P.s. please start new threads for new topics.) -Original

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Lance Norskog
My limited experience with larger indexes is: 1) the logistics of copying around and backing up this much data, and 2) indexing is disk-bound. We're on SAS disks and it makes no difference between one indexing thread and a dozen (we have small records). Smaller returns are faster. You need to

Re: dataset parameters suitable for lucene application

2007-09-26 Thread Mike Klaas
On 26-Sep-07, at 10:50 AM, Law, John wrote: Thanks all! One last question... If I had a collection of 2.5 billion docs and a demand averaging 200 queries per second, what's the confidence that Solr/Lucene could handle this volume and execute search with sub-second response times? No

Re: custom sorting

2007-09-26 Thread Mike Klaas
On 26-Sep-07, at 5:14 AM, Sandeep Shetty wrote: Hi Guys, this question as been asked before but i was unable to find an answer thats good for me, so hope you guys can help again i am working on a website where we need to sort the results by distance from the location entered by the user. I

What is facet?

2007-09-26 Thread Teruhiko Kurosaka
Could someone tell me what facet is? I have a vague idea but I am not too clear. A pointer to a sample web site that uses Solr facet would be very good. Thanks. -Kuro

RE: Geographical distance searching

2007-09-26 Thread Will Johnson
With the new/improved value source functions it should be pretty easy to develop a new best practice. You should be able to pull in the lat/lon values from valuesource fields and then do your greater circle calculation. - will -Original Message- From: Lance Norskog [mailto:[EMAIL

Converting German special characters / umlaute

2007-09-26 Thread Matthias Eireiner
Dear list, I have two questions regarding German special characters or umlaute. is there an analyzer which automatically converts all german special characters to their specific dissected from, such as ü to ue and ä to ae, etc.?! I also would like to have, that the search is always run against

Re: What is facet?

2007-09-26 Thread Ezra Ball
Faceted search is an approach to search where a taxonomy or categorization scheme is visible in addition to document matches. http://www.searchtools.com/info/faceted-metadata.html --Ezra. On 9/26/07 3:47 PM, Teruhiko Kurosaka [EMAIL PROTECTED] wrote: Could someone tell me what facet is? I

Re: Converting German special characters / umlaute

2007-09-26 Thread Thomas Traeger
Try the SnowballPorterFilterFactory described here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters You should use the German2 variant that converts ä and ae to a, ö and oe to o and so on. More details: http://snowball.tartarus.org/algorithms/german2/stemmer.html Every document in

Re: What is facet?

2007-09-26 Thread Chris Hostetter
: Faceted search is an approach to search where a taxonomy or categorization : scheme is visible in addition to document matches. My ApacheConUS2006 talk went into a little more detail, including the best definition of faceted searching/browsing I've ever seen...

Re: Converting German special characters / umlaute

2007-09-26 Thread Chris Hostetter
: is there an analyzer which automatically converts all german special : characters to their specific dissected from, such as ü to ue and ä to : ae, etc.?! See also the ISOLatin1TokenFilter which does this regardless of langauge. : I also would like to have, that the search is always run

Re: Geographical distance searching

2007-09-26 Thread Ian Holsman
Have you guys seen Local Lucene ? http://www.nsshutdown.com/projects/lucene/whitepaper/*locallucene*.htm no need for mysql if you don't want too. rgrds Ian Will Johnson wrote: With the new/improved value source functions it should be pretty easy to develop a new best practice. You should be

Re: searching for non-empty fields

2007-09-26 Thread Pieter Berkel
I've experienced a similar problem before, assuming the field type is string (i.e. not tokenized), there is subtle yet important difference between a field that is null (i.e. not contained in the document) and one that is an empty string (in the document but with no value). See

anyone can send me jetty-plus

2007-09-26 Thread James liu
i can't download it from http://jetty.mortbay.org/jetty5/plus/index.html -- regards jl

Re: searching for non-empty fields

2007-09-26 Thread Ryan McKinley
Your query will work if you make sure the URL field is omitted from the document at index time when the field is blank. adding something like: filter class=solr.LengthFilterFactory min=1 max=1 / to the schema field should do it without needing to ensure it is not null or on the

Re: searching for non-empty fields

2007-09-26 Thread Chris Hostetter
: Date: Thu, 27 Sep 2007 00:12:48 -0400 : From: Ryan McKinley [EMAIL PROTECTED] : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: searching for non-empty fields : : : Your query will work if you make sure the URL field is omitted from the : document at