Re: solr performance for documents with hundreds of fields

2008-04-25 Thread Umar Shah
I am just wondering, because having 200 fields seems like too much (for me), I want to know if people actually have such kind of schemas and how well they perform. On Thu, Apr 24, 2008 at 5:10 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Are you actually seeing performance problems or just

Re: Solr with Auto-suggest

2008-04-25 Thread Rantjil Bould
Nice. Great help. I have added following fields to hold tokens. fieldType name=prefix_full class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter

Re: Updating in Solr.SOLR-139

2008-04-25 Thread nutchvf
Hi!!! I have already realized the mistake.My id field was generated from the copy of another field called url.In other words copyField source=url dest=id/ It seems that the thing did not work well when the id field was generated from the copy of another one. Now I have changed the

How to extract terms associated with a field

2008-04-25 Thread Rantjil Bould
Hello Group, I have a field name prefix1 and which is copy of another field called content. Field type of prefix1 is fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer

Custom Filter. Pass field thru regular expression to match.

2008-04-25 Thread surfer10
My data, found with solr needs to be tested against matching regular expression formed at auery time. to avoid sending big data chunks via http i've suggested that results can be verified on solr side before they sent to client. I've heard that we can assign custom java function for filtering

Re: GSA - Solr

2008-04-25 Thread Lukas Vlcek
Otis, May I ask you how do you go about handling user access privileges? I mean you need some mechanism how to get user privileges from corporate environment (LDAP for example) and filter returned hits using document access policy. Also you may be caching these informations as well for

Delete's increase while adding new documents

2008-04-25 Thread Tim Mahy
Hi all, we send xml add document messages to Solr and we notice something very strange. We autocommit at 10 documents, starting from a total clean index (removed the data folder), when we start uploading we notice that the docsPending is going up but also that the deletesPending is going up

Re: Caching of DataImportHandler's Status Page

2008-04-25 Thread Sean Timm
Noble-- You should probably include SOLR-505 in your DataImportHandler patch. -Sean Noble Paul നോബിള്‍ नोब्ळ् wrote: It is caused by the new caching feature in Solr. The caching is done at the browser level . Slr just sends appropriate headers. .We had raised an issue to disable that. BTW

Re: solr performance for documents with hundreds of fields

2008-04-25 Thread Erik Hatcher
That is well within the boundaries of what Solr/Lucene can handle. But, of course, it depends on what you're doing with those fields too. Putting 200 fields into a dismax qf specification, for example, would surely be bad for performance :) But querying on only a handful of fields or

Re: Solr with Auto-suggest

2008-04-25 Thread Ryan McKinley
On Apr 25, 2008, at 3:02 AM, Rantjil Bould wrote: Nice. Great help. I have added following fields to hold tokens. fieldType name=prefix_full class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/

Reindexing mode for solr

2008-04-25 Thread Jonathan Ariel
Hi, Is there any way to tell solr to load in a kind of reindexing mode, which won't open a new searcher after every commit, etc? This is just when you don't have it available to query because you just want to reindex all the information. What do you think? Jonathan

Re: Reindexing mode for solr

2008-04-25 Thread Otis Gospodnetic
Don't think so. But you reindex on the master and query on the slave. If your concern is that the index will be sent to the search slave while you are still reindexing, just don't commit until you are done. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original

Re: Caching of DataImportHandler's Status Page

2008-04-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
Yes , We are waiting for the patch to get committed. --Noble On Fri, Apr 25, 2008 at 5:36 PM, Sean Timm [EMAIL PROTECTED] wrote: Noble-- You should probably include SOLR-505 in your DataImportHandler patch. -Sean Noble Paul നോബിള്‍ नोब्ळ् wrote: It is caused by the new caching

Help required with external value source SOLR-351

2008-04-25 Thread Howard Lee
Help required with external value source SOLR-351 I'm trying to get this new feature to work without much success. I've completed the following steps. 1) dowloaded latest nightly build 2) added the following to schema.xml fieldtype name=file keyField=job_id defVal=1 stored=false indexed=false

Re: Reindexing mode for solr

2008-04-25 Thread Walter Underwood
In our setup, snapshooter is triggered on optimize, not commit. We can commit all we want on the master without making a snapshot. That only happens when we optimize. The new Searcher is the biggest performance impact for us. We don't have that many documents (~250K), so copying an entire index

Re: Reindexing mode for solr

2008-04-25 Thread Jonathan Ariel
You're right. But I'm concerned about some Max Number of Searchers Reached that I usually get when reindexing every one in a while. On Fri, Apr 25, 2008 at 12:28 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Don't think so. But you reindex on the master and query on the slave. If your

Re: Reindexing mode for solr

2008-04-25 Thread Otis Gospodnetic
Like Wunder said, you can reindex every once in a while all you want, just don't create index snapshots then you commit (disable the postcommit hook in solrconfig.xml) or don't commit at all until you are done. Or call optimize at the end and enable postOptimize hook. Otis -- Sematext --

Re: solr performance for documents with hundreds of fields

2008-04-25 Thread Otis Gospodnetic
What Erik said ;) 200 fields is not a problem. Things to watch out for are: - more index file and thus more open file descriptors if you use non-compound Lucene index format and are working with non-optimized indices (on master - optimize your index before it gets to slaves) - slower merging

Re: GSA - Solr

2008-04-25 Thread Otis Gospodnetic
The GSA - Solr conversion I mentioned has not yet happened and may not even include doc access right functionality. However, when I implemented things like that in the past, I used custom trickery, not a general open framework. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: GSA - Solr

2008-04-25 Thread Walter Underwood
Custom trickery is pretty standard for access controls in search. A couple of the high points from deploying Ultraseek: three incompatible single sign on system in one company and a system that controlled which links were shown instead of access to the docs themselves. The latter amazed me. If

DisMax and pf

2008-04-25 Thread Otis Gospodnetic
Hello, I was looking at DisMax and playing with its pf parameter. I created a sample index with field content. I set pf to: content^2.0 and expected to see (content:my query here)^2.0 in the query (debugQuery=true). However, I only got (content:my query here) -- no boost. Is this a bug or

RE: Solr with Auto-suggest

2008-04-25 Thread Lance Norskog
This what the spellchecker does. It makes a separate Lucene index of n-gram letters and searches those. Works pretty well and it is outside the main index. I did an experimental variation indexing word pairs as phrases, and it worked well too. Lance Norskog -Original Message- From: Ryan

Re: Reindexing mode for solr

2008-04-25 Thread Mike Klaas
On 25-Apr-08, at 7:05 AM, Jonathan Ariel wrote: Hi, Is there any way to tell solr to load in a kind of reindexing mode, which won't open a new searcher after every commit, etc? This is just when you don't have it available to query because you just want to reindex all the information.

Re: Delete's increase while adding new documents

2008-04-25 Thread Mike Klaas
On 25-Apr-08, at 4:27 AM, Tim Mahy wrote: Hi all, we send xml add document messages to Solr and we notice something very strange. We autocommit at 10 documents, starting from a total clean index (removed the data folder), when we start uploading we notice that the docsPending is

Re: MultiThreaded Document Loader?

2008-04-25 Thread Mike Klaas
On 24-Apr-08, at 2:57 PM, oleg_gnatovskiy wrote: Hello. I was wondering if Solr has some kind of a multi-threaded document loader? I've been using post.sh (curl) to post documents to my Solr server, and it's pretty slow. I know it should be pretty easy to write one up, but I was just

Re: Standard vs. DisMaxQueryHandler

2008-04-25 Thread David Smiley @MITRE.org
I am frustrated that I have to pick between the two because I want both. The way I look at it, there should be a more configurable query handler which allows me to dimax if I want to, and pick a parser for the user's query (like the flexible one used by the standard query handler, or the more