Re: Spellcheker and Dismax both

2008-08-14 Thread Shalin Shekhar Mangar
The SpellCheckerRequestHandler is now deprecated with Solr 1.3 and it has been replaced by SpellCheckComponent. http://wiki.apache.org/solr/SpellCheckComponent On Thu, Aug 14, 2008 at 3:42 AM, anshuljohri [EMAIL PROTECTED] wrote: Hi, I am using dismax handler and I want to use spellchecker

Re: Best way to index without diacritics

2008-08-14 Thread Norberto Meijome
( 2 in 1 reply) On Wed, 13 Aug 2008 09:59:21 -0700 Walter Underwood [EMAIL PROTECTED] wrote: Stripping accents doesn't quite work. The correct translation is language-dependent. In German, o-dieresis should turn into oe, but in English, it shoulde be o (as in co__perate or M__tley Cr__e). In

List of available facet fields returned with the query results

2008-08-14 Thread Barry Harding
Hi, I have solr setup to index technical data for a number of different types of products, and this means that different product have different facet fields available. For example here would be a small example of the sort of data we are indexing, in reality there are between 10 and 20 facet

Re: Administrative questions

2008-08-14 Thread Jason Rennie
On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible licensing terms. For

RE: Exception during Solr startup

2008-08-14 Thread Kashyap, Raghu
Hi Yonik Erik, Thanks to both of you. It seems like our container had some issues and was causing this problem. Thanks, Raghu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, August 13, 2008 10:57 AM To:

Re: List of available facet fields returned with the query results

2008-08-14 Thread Shalin Shekhar Mangar
Hi Barry, If each category has an exclusive set of fields on which you want to facet on, then you can simply facet on all facet-able fields (across all categories). The ones which are not present for the selected category will show up with zero facets which your front-end can suppress. However if

RE: Best way to index without diacritics

2008-08-14 Thread Steven A Rowe
Hi Norberto, On 08/14/2008 at 8:10 AM, Norberto Meijome wrote: On 8/13/08 9:16 AM, Steven A Rowe [EMAIL PROTECTED] wrote: Hi Norberto, https://issues.apache.org/jira/browse/LUCENE-1343 hi Steve, thanks for the pointer. this is a Lucene entry... I thought the Latin-filter was

RE: List of available facet fields returned with the query results

2008-08-14 Thread Barry Harding
Hi Shalin, As there is certainly the potential for several thousand different attribute types across all of our category's I guess I will have to manage them myself (was hoping for a short-cut or that I was missing a trick) but no problem. Solr still seems to outperform the commercial package we

Re: Index size vs. number of documents

2008-08-14 Thread Phillip Farber
Erick Erickson wrote: I'm surprised, as you are, by the non-linearity. Out of curiosity, what is your MaxFieldLength? By default only the first 10,000 tokens are added to a field per document. If you haven't set this higher, that could account for it. We set it to a very large number so we

Synonyms help in 1.3-HEAD?

2008-08-14 Thread Matthew Runo
Hello folks! Having a heck of a time trying to get a synonyms file to work properly. It seems that something's wrong with the way it's been set up, but, honestly, I can't see anything wrong with it. Some samples... This works... zutanoapparel = zutano But this does not... aadias, aadidas,

Re: Synonyms help in 1.3-HEAD?

2008-08-14 Thread Yonik Seeley
There should be no limit, so you may have uncovered a bug. Could you open a JIRA issue? If it's a real bug, it should get fixed before 1.3. -Yonik On Thu, Aug 14, 2008 at 12:35 PM, Matthew Runo [EMAIL PROTECTED] wrote: Hello folks! Having a heck of a time trying to get a synonyms file to

Re: Synonyms help in 1.3-HEAD?

2008-08-14 Thread Matthew Runo
Thank you for your suggestion, I really don't see anything 'wrong' with the longer lists.. I entered https://issues.apache.org/jira/browse/SOLR-702 for this issue, and attached relevant files. If you need anything more, don't hesitate to contact me! Thanks for your time! Matthew Runo

Duplicate Data Across Fields

2008-08-14 Thread wojtekpia
I have 2 fields which will sometimes contain the same data. When they do contain the same data, am I paying the same performance cost as when they contain unique data? I think the real question here is: does Lucene index values per field, or per document? -- View this message in context:

Re: spellcheck collation

2008-08-14 Thread Grant Ingersoll
I believe I just fixed this on SOLR-606 (thanks to Stefan's patch). Give it a try and let us know. -Grant On Aug 13, 2008, at 2:25 PM, Doug Steigerwald wrote: I've noticed a few things with the new spellcheck component that seem a little strange. Here's my document: doc field

Re: spellcheck collation

2008-08-14 Thread Doug Steigerwald
I'd try, but the build is failing from (guessing) Ryan's last commit: compile: [mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core [javac] Compiling 337 source files to /Users/dsteiger/Desktop/ java/solr/build/core [javac]

More files in index directory than expected

2008-08-14 Thread Chris Harris
It's my understanding that if my mergeFactor is 10, then there shouldn't be more than 11 segments in my index directory (10 segments, plus an additional segment if a merge is in progress). It would seem to follow that there shouldn't be more than 11 fdt files, 11 tis files, etc.. However, I'm

Re: spellcheck collation

2008-08-14 Thread Ryan McKinley
have you updated recently? isEnabled() was removed last night... On Aug 14, 2008, at 2:30 PM, Doug Steigerwald wrote: I'd try, but the build is failing from (guessing) Ryan's last commit: compile: [mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core [javac] Compiling 337

Re: term list

2008-08-14 Thread Jack Tuhman
Humm, I am new to the world of search I am looking for something that will give me a list of significant words or phrases extracted from a document stored in solr. Jack On Fri, Aug 8, 2008 at 9:33 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: See https://issues.apache.org/jira/browse/SOLR-651.

IndexOutOfBoundsException

2008-08-14 Thread Ian Connor
Hi, I have rebuilt my index a few times (it should get up to about 4 Million but around 1 Million it starts to fall apart). Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at

Highlighting returns incorrect text on some results?

2008-08-14 Thread pdovyda2
This is kind of a strange issue, but when I submit a query and ask for highlighting back, sometimes the highlighted text includes a question mark at the beginning, although a question mark character does not appear in the field that the highlighted text is taken from. I've put some sample XML

Re: spellcheck collation

2008-08-14 Thread Doug Steigerwald
Right before I sent the message. Did a 'svn up src/;and clean;ant dist' and it failed. Seems to work fine now. On Aug 14, 2008, at 2:38 PM, Ryan McKinley wrote: have you updated recently? isEnabled() was removed last night... On Aug 14, 2008, at 2:30 PM, Doug Steigerwald wrote: I'd

Re: term list

2008-08-14 Thread Grant Ingersoll
Assuming you mean significant in the traditional IR sense, I would start with the MoreLikeThis. See http://wiki.apache.org/solr/MoreLikeThisHandler In particular the mlt.interestingTerms option. As for phrases, that is a bit harder. You could try playing around with token-based n-grams

QueryResultsCache and DocSet filter

2008-08-14 Thread Kevin Osborn
We have a bunch of user caches that return DocSet objects. So, we intersect them and send a DocSet filter and the actual query to getDocListAndSet or getDocList. The problem here is that the calls in SolrIndexSearcher don't appear to use the QueryResultsCache if the filer is a DocSet rather

NOTICE - solrj MultiCore{Params/Request/Response} have been renamed CoreAdmin{Params/Request/Response}

2008-08-14 Thread Ryan McKinley
In the effort to clean up confusion around MultiCore usage, we have renamed the class that handle runtime core administration from MultiCoreX to CoreAdminX. Additionally, the path that the default MultiCoreRequest expects to hit is: /admin/cores rather then /admin/ multicore -- if you have

Re: IndexOutOfBoundsException

2008-08-14 Thread Yonik Seeley
Yikes... not good. This shouldn't be due to anything you did wrong Ian... it looks like a lucene bug. Some questions: - what platform are you running on, and what JVM? - are you using multicore? (I fixed some index locking bugs recently) - are there any exceptions in the log before this? - how

Re: QueryResultsCache and DocSet filter

2008-08-14 Thread Yonik Seeley
On Thu, Aug 14, 2008 at 3:15 PM, Kevin Osborn [EMAIL PROTECTED] wrote: The problem here is that the calls in SolrIndexSearcher don't appear to use the QueryResultsCache if the filer is a DocSet rather than a ListQuery. Right... using a DocSet as part of the cache key would be pretty slow (key

Re: Highlighting returns incorrect text on some results?

2008-08-14 Thread Otis Gospodnetic
Paul, we had many highlighter-related changes since 1.2, so I suggest you try the nightly. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: pdovyda2 [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, August 14, 2008 2:56:42

Simple Searching Question

2008-08-14 Thread Jake Conk
Hello, I inserted the following documents into Solr: --- add doc field name=id124/field field name=foobar_facetJake Conk/field /doc doc field name=id125/field field name=foobar_facetJake Conk/field /doc

Re: Simple Searching Question

2008-08-14 Thread Shalin Shekhar Mangar
Hi Jake, What is the type of the foobar_facet field in your schema.xml ? Did you add foobar_facet as the default search field? On Fri, Aug 15, 2008 at 3:13 AM, Jake Conk [EMAIL PROTECTED] wrote: Hello, I inserted the following documents into Solr:

Re: More files in index directory than expected

2008-08-14 Thread Chris Harris
On Thu, Aug 14, 2008 at 2:01 PM, Michael McCandless [EMAIL PROTECTED] wrote: Chris Harris [EMAIL PROTECTED] wrote: It's my understanding that if my mergeFactor is 10, then there shouldn't be more than 11 segments in my index directory (10 segments, plus an additional segment if a merge is in

Re: More files in index directory than expected

2008-08-14 Thread Mark Miller
The main thing that bugs me about this index now is that the latest version of Luke (0.8.1) won't open it. (Unknown format version: -6) The Solr Luke handler works fine with it, though. Luke comes with a released version of Lucene probably, while solr is using a later version. You have to

Re: Simple Searching Question

2008-08-14 Thread Jake Conk
Hi Shalin, foobar_facet is a dynamic field. Its defined in my schema like this: dynamicField name=*_facet type=string indexed=true stored=true/ I have the default search field set to text. Can I use more than one default search field? defaultSearchFieldtext/defaultSearchField Thanks, - Jake

Re: More files in index directory than expected

2008-08-14 Thread Yonik Seeley
On Thu, Aug 14, 2008 at 6:31 PM, Chris Harris [EMAIL PROTECTED] wrote: (The only time a segment will be modified is if you delete files from it, and that will only alter the segment's .del file, leaving .tis and friends alone.) Actually, these days .del files are even versioned. I don't know

Re: Best way to index without diacritics

2008-08-14 Thread Norberto Meijome
On Thu, 14 Aug 2008 11:34:47 -0400 Steven A Rowe [EMAIL PROTECTED] wrote: [...] The kind of filter Walter is talking about - a generalized language-aware character normalization Solr/Lucene filter - does not yet exist. My guess is that if/when it does materialize, both the Solr and the

Re: Simple Searching Question

2008-08-14 Thread Rob Casson
you're likely not copyField-ing *_facet to text, and we'd need to see what type of field it is to see how it will be analyzed at both search/index time. the default schema.xml file is pretty well documented, so you might want to spend some time looking thru it, and reading the commentslots of

Re: Simple Searching Question

2008-08-14 Thread Jake Conk
Rob, Actually I am copying *_facet to text. I have the following for copyField in my schema: copyField source=*_t dest=text/ copyField source=*_facet dest=text/ This is my field configuration in my schema: fields field name=id type=string indexed=true stored=true required=true /

Re: IndexOutOfBoundsException

2008-08-14 Thread Ian Connor
I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux %java

Re: IndexOutOfBoundsException

2008-08-14 Thread Yonik Seeley
Since this looks like more of a lucene issue, I've replied in [EMAIL PROTECTED] -Yonik On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor [EMAIL PROTECTED] wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to