RE: Best use of wildcard searches

2007-08-09 Thread Pierre-Yves LANDRON
Hello I'm exactly in the same situation as you. I've got some structured subject ( as subjects:main subject/sub subject/sub sub subject ) and want to search them as litteral from a given level (subjects:main subject/*). As you know subjects:main subject/* doesn't work (but it should, shouldn't

Re: Best use of wildcard searches

2007-08-09 Thread Erick Erickson
I just saw an e-mail from Yonik suggesting escaping the space. I know so little about Solr that all I can do is parrot Yonik... Erick On 8/8/07, Matthew Runo [EMAIL PROTECTED] wrote: OK. So a followup question.. ?q=department_exact:Apparel%3EMen's%

Too many open files

2007-08-09 Thread Kevin Holmes
result status=1java.io.FileNotFoundException: /usr/local/bin/apache-solr/enr/solr/data/index/_16ik.tii (Too many open files) When I'm importing, this is the error I get. I know it's vague and obscure. Can someone suggest where to start? I'll buy a bag of MMs (not peanut) for anyone who can

question: how to divide the indexing into sperate domains

2007-08-09 Thread Ben Shlomo, Yatir
Hi! say I have 300 csv files that I need to index. Each one holds millions of lines (each line is a few fields separated by commas) Each csv file represents a different domain of data (e,g, file1 is computers, file2 is flowers, etc) There is no indication of the domain ID in the data

RE: Too many open files

2007-08-09 Thread Kevin Holmes
You're a gentleman and a scholar. I will donate the MMs to myself :). Can you tell me from this snippet of my solrconfig.xml what I might tweak to make this more betterer? -KH indexDefaults !-- Values here affect all index writers and act as a default unless overridden. --

RE: Too many open files

2007-08-09 Thread Jonathan Woods
You could try committing updates more frequently, or maybe optimising the index beforehand (and even during!). I imagine you could also change the Solr config, if you have access to it, to tweak indexing (or index creation) parameters - http://wiki.apache.org/solr/SolrConfigXml should be of use

Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Kevin Holmes
I inherited an existing (working) solr indexing script that runs like this: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr We're injecting about 1000 records / minute (constantly), frequently pushing the edge of our CPU / RAM

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
Condensing the loader into a single executable sounds right if you have performance problems. ;-) You could also try adding multiple docs in a single post if you notice your problems are with tcp setup time, though if you're doing localhost connections that should be minimal. If you're already

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread David Whalen
What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only handle search requests, not indexing requests on top of them. Plus, I have to believe there's a faster way to get documents into solr/lucene than

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Tobin Cataldo
(re)building the index separately (ie. on a different computer) and then replacing the active index may be an option. David Whalen wrote: What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Brian Whitman
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote: 2: Is there a way to inject into solr without using POST / curl / http? Check http://wiki.apache.org/solr/EmbeddedSolr There's examples in java and cocoa to use the DirectSolrConnection class, querying and updating solr w/o a web

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
If it's a contention between search and indexing, separate them via a query-slave and an index-master. --cw On 8/9/07, David Whalen [EMAIL PROTECTED] wrote: What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, David Whalen [EMAIL PROTECTED] wrote: Plus, I have to believe there's a faster way to get documents into solr/lucene than using curl One issue with HTTP is latency. You can get around that by adding multiple documents per request, or by using multiple threads concurrently. You

RE: Too many open files

2007-08-09 Thread Stu Hood
If you check out the documentation for mergeFactor, you'll find that adjusting it downward can lower the number of open files. Just remember that it is a speed tradeoff, and only lower it as much as you need to to stop getting the too many files errors. See this section:

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, Siegfried Goeschl [EMAIL PROTECTED] wrote: +) my colleague just finished a database import service running within the servlet container to avoid writing out the data to the file system and transmitting it over HTTP. Most people doing this read data out of the database and construct

always fail to update the first time after I restart the server

2007-08-09 Thread Xuesong Luo
Hi, I noticed the first index update after I restart my jboss server always fail with the exception below. Any update after that works fine. Does anyone know what the problem is? The solr version I'm using is solr1.2 Thanks Xuesong 2007-08-09 11:41:44,559 ERROR [STDERR] Aug 9, 2007 11:41:44 AM

Synonym questions

2007-08-09 Thread Tom Hill
Hi - Just looking at synonyms, and had a couple of questions. 1) For some of my synonyms, it seems to make senses to simply replace the original word with the other (e.g. theatre = theater, so searches for either will find either). For others, I want to add an alternate term while preserving the

Re: Too many open files

2007-08-09 Thread Mike Klaas
On 9-Aug-07, at 7:52 AM, Ard Schrijvers wrote: ulimit -n 8192 Unless you have an old, creaky box, I highly recommend simply upping your filedesc cap. -Mike

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote: Hmm.. I just tried the following three queries... /?q=department_exact:ApparelMen's? ApparelJackets*fq=country_code:USfq=brand_exact:adidas... (no results) /?q=department_exact:ApparelMen's\

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Here you go.. I thought that string wasn't munged, so I used that... field name=department type=text indexed=true stored=true/ field name=department_exact type=string indexed=true stored=true/ copyField source=department dest=department_exact/

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote: Here you go.. I thought that string wasn't munged, so I used that... field name=department type=text indexed=true stored=true/ field name=department_exact type=string indexed=true stored=true/ copyField source=department dest=department_exact/

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Kevin Holmes
Is this a native feature, or do we need to get creative with scp from one server to the other? If it's a contention between search and indexing, separate them via a query-slave and an index-master. --cw

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr For the most up-to-date solr client for python, check out https://issues.apache.org/jira/browse/SOLR-216 -Yonik

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Yes, we've reindexed several times. Here are three sample result sets.. 1 - ?q=department_exact:ApparelMen's? ApparelJackets*fq=country_code:USfq=brand_exact:adidas 2 - ?q=department_exact:ApparelMen's\ ApparelJackets*fq=country_code:USfq=brand_exact:adidas 3 -

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote: Yes, we've reindexed several times. Here are three sample result sets.. 1 - ?q=department_exact:ApparelMen's? ApparelJackets*fq=country_code:USfq=brand_exact:adidas 2 - ?q=department_exact:ApparelMen's\

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: They translate to different queries. But can I see the XML output for 1 and 2 with debugQuery=onindent=on appended? Or perhaps with wt=python would be less confusing seeing that there are '' chars in there that would otherwise be escaped.

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Sure thing! Heres 1, and 2. 1 - just a space. 2 - a \ . ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 9, 2007, at 1:14 PM, Yonik Seeley

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Hm, I don't see any attachments, I'm forwarding them to you directly. Would anyone else like to see them? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Feel free to run some queries yourself. We opened the firewall for this box... http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's\%20Apparel% 3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python ++ |

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote: Feel free to run some queries yourself. We opened the firewall for this box... http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's\%20Apparel% 3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python OK, so this query

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's%20Apparel% 3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python The same exact query, with... wait.. Wow. I'm making myself look like an idiot. I swear that these queries didn't work the first time I ran them...

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote: http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's%20Apparel% 3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python The same exact query, with... wait.. Wow. I'm making myself look like an idiot. I swear that

Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Benjamin Higgins
Hi all, I'd like to provide a blurb of documents matching a search in the case when there is no text highlighted. I assumed that perhaps the highlighter would give me back the first few words in a document if this occurred, but it doesn't. My conundrum is that I'd rather not grab the whole

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Yonik Seeley
On 8/9/07, Benjamin Higgins [EMAIL PROTECTED] wrote: Hi all, I'd like to provide a blurb of documents matching a search in the case when there is no text highlighted. I assumed that perhaps the highlighter would give me back the first few words in a document if this occurred, but it doesn't.

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Mike Klaas
On 9-Aug-07, at 2:10 PM, Benjamin Higgins wrote: Hi all, I'd like to provide a blurb of documents matching a search in the case when there is no text highlighted. I assumed that perhaps the highlighter would give me back the first few words in a document if this occurred, but it doesn't.

Returning a list of matching words

2007-08-09 Thread Thiago Jackiw
This may be obvious but I can't get my head straight. Is there a way to return a list of matching words that a record got matched against? For instance: record_a: ruby, solr, mysql, rails record_b: solr, java Then ?q=solr+OR+rails would return the matched words for the records record_a: solr,

Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
I'm adding a field to be the source of the spellcheck database. Since that is its only job, it has raw text lower-cased, de-Latin1'd, and de-duplicated. Since it is only for the spellcheck DB, it does not need to keep duplicates. I specified it as 'multiValued=false and used copyField from a

RE: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Benjamin Higgins
Thanks Mike. I didn't think of creating a blurb beforehand, but that's a great solution. I'll probably do that. Yonik, I can still add a JIRA issue if you'd like, though. Ben -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 2:32 PM To:

Is it possible to know from where in the field highlighed text comes from?

2007-08-09 Thread Benjamin Higgins
Hi again, It'd be nice to know what the starting line number is for highlighted snippets. I imagine others might find it useful to know the starting byte offset. Is there an easy way to add this in? I'm not afraid of hacking the source if it's not too involved. Thanks. Ben

tomcat and solr multiple instances

2007-08-09 Thread Jae Joo
Hi, I have built 2 solr instance - one is example and the other is ca_companies. The ca_companies solr instance is working find, but example is not working... In the admin page, /solr/admin, for example instance, it shows that Cwd=/rpt/src/apache-solr-1.2.0/ca_companies/solr/conf

EmbeddedSolr and optimize

2007-08-09 Thread Sundling, Paul
http://wiki.apache.org/solr/EmbeddedSolr Following the example on connecting to the Index directly without using HTTP, I tried to optimize by passing the true flag to the CommitUpdateCommand. When optimizing an index with Lucene directly it doubles the size of the index temporarily and then

RE: tomcat and solr multiple instances

2007-08-09 Thread Jae Joo
Here are the Catalina/localhost/ files For example instance Context docBase=/rpt/src/apache-solr-1.2.0/dist/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/rpt/src/apache-solr-1.2.0/example/solr override=true / /Context For ca_companies

Re: Returning a list of matching words

2007-08-09 Thread Yonik Seeley
On 8/9/07, Thiago Jackiw [EMAIL PROTECTED] wrote: This may be obvious but I can't get my head straight. Is there a way to return a list of matching words that a record got matched against? Unfortunately no... lucene doesn't provide that capability with standard queries. You could do it (slower)

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Lance Norskog
Jython is a Python interpreter implemented in Java. (I have a lot of Python code.) Total throughput in the servlet is very sensitive to the total number of servlet sockets available v.s. the number of CPUs. The different analysers have very different performance. You might leave some data in

Re: Multivalued fields and the 'copyField' operator

2007-08-09 Thread Yonik Seeley
On 8/9/07, Lance Norskog [EMAIL PROTECTED] wrote: I'm adding a field to be the source of the spellcheck database. Since that is its only job, it has raw text lower-cased, de-Latin1'd, and de-duplicated. Since it is only for the spellcheck DB, it does not need to keep duplicates. Duplicate

Re: tomcat and solr multiple instances

2007-08-09 Thread Pieter Berkel
The current working directory (Cwd) is the directory from which you started the Tomcat server and is not dependent on the Solr instance configurations. So as long as SolrHome is correct for each Solr instance, you shouldn't have a problem. cheers, Piete On 10/08/07, Jae Joo [EMAIL PROTECTED]

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Sean Timm
It should probably be configurable: (1) return nothing if no match, (2) substitute with an alternate field, (3) return first sentence or N number of tokens. -Sean Yonik Seeley wrote on 8/9/2007, 5:50 PM: On 8/9/07, Benjamin Higgins [EMAIL PROTECTED] wrote: Thanks Mike. I didn't think of

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Norberto Meijome
On Thu, 9 Aug 2007 15:23:03 -0700 Lance Norskog [EMAIL PROTECTED] wrote: Underlying this all, you have a sneaky network performance problem. Your successive posts do not reuse a TCP socket. Obvious: re-opening a new socket each post takes time. Not obvious: your server has sockets building up

RE: Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
If we have a field spellcheck_db, and have two copyField lines for it: fieldType name=spellcheck ... Basically the text type without stemming... field name=title type=string / field name=description type=string / field name=spellcheck_db multiValued=false

Re: [newbie] how to debug the schema?

2007-08-09 Thread Franz Allan Valencia See
Good day, danc86 of #lucene gave me the answer - I was not storing the fields :-) Thanks, Franz On 8/9/07, Ryan McKinley [EMAIL PROTECTED] wrote: [QUESTION] What could be the problem? .Or what else can I do to debug this problem? In general 'luke' is a great tool to figure out

RE: Best use of wildcard searches

2007-08-09 Thread Jonathan Woods
Maybe there's a different way, in which path-like values like this are treated explicitly. I use a similar approach to Matthew at www.colfes.com, where all pages are generated from Lucene searches according to filters on a couple of hierarchical categories ('spaces'), i.e. subject and