Hello I'm exactly in the same situation as you. I've got some structured
subject ( as subjects:main subject/sub subject/sub sub subject ) and want to
search them as litteral from a given level (subjects:main subject/*). As you
know subjects:main subject/* doesn't work (but it should, shouldn't
I just saw an e-mail from Yonik suggesting escaping the space. I know
so little about Solr that all I can do is parrot Yonik...
Erick
On 8/8/07, Matthew Runo [EMAIL PROTECTED] wrote:
OK.
So a followup question..
?q=department_exact:Apparel%3EMen's%
result status=1java.io.FileNotFoundException:
/usr/local/bin/apache-solr/enr/solr/data/index/_16ik.tii (Too many open
files)
When I'm importing, this is the error I get. I know it's vague and
obscure. Can someone suggest where to start? I'll buy a bag of MMs
(not peanut) for anyone who can
Hi!
say I have 300 csv files that I need to index.
Each one holds millions of lines (each line is a few fields separated by
commas)
Each csv file represents a different domain of data (e,g, file1 is
computers, file2 is flowers, etc)
There is no indication of the domain ID in the data
You're a gentleman and a scholar. I will donate the MMs to myself :).
Can you tell me from this snippet of my solrconfig.xml what I might
tweak to make this more betterer?
-KH
indexDefaults
!-- Values here affect all index writers and act as a default unless
overridden. --
You could try committing updates more frequently, or maybe optimising the
index beforehand (and even during!). I imagine you could also change the
Solr config, if you have access to it, to tweak indexing (or index creation)
parameters - http://wiki.apache.org/solr/SolrConfigXml should be of use
I inherited an existing (working) solr indexing script that runs like
this:
Python script queries the mysql DB then calls bash script
Bash script performs a curl POST submit to solr
We're injecting about 1000 records / minute (constantly), frequently
pushing the edge of our CPU / RAM
Condensing the loader into a single executable sounds right if
you have performance problems. ;-)
You could also try adding multiple docs in a single post if you
notice your problems are with tcp setup time, though if you're
doing localhost connections that should be minimal.
If you're already
What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication. We'd
like for the HTTP daemon to only handle search requests, not
indexing requests on top of them.
Plus, I have to believe there's a faster way to get documents
into solr/lucene than
(re)building the index separately (ie. on a different computer) and then
replacing the active index may be an option.
David Whalen wrote:
What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication. We'd
like for the HTTP daemon to only
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote:
2: Is there a way to inject into solr without using POST / curl /
http?
Check http://wiki.apache.org/solr/EmbeddedSolr
There's examples in java and cocoa to use the DirectSolrConnection
class, querying and updating solr w/o a web
If it's a contention between search and indexing, separate them
via a query-slave and an index-master.
--cw
On 8/9/07, David Whalen [EMAIL PROTECTED] wrote:
What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication. We'd
like for the
On 8/9/07, David Whalen [EMAIL PROTECTED] wrote:
Plus, I have to believe there's a faster way to get documents
into solr/lucene than using curl
One issue with HTTP is latency. You can get around that by adding
multiple documents per request, or by using multiple threads
concurrently.
You
If you check out the documentation for mergeFactor, you'll find that adjusting
it downward can lower the number of open files. Just remember that it is a
speed tradeoff, and only lower it as much as you need to to stop getting the
too many files errors.
See this section:
On 8/9/07, Siegfried Goeschl [EMAIL PROTECTED] wrote:
+) my colleague just finished a database import service running within
the servlet container to avoid writing out the data to the file system
and transmitting it over HTTP.
Most people doing this read data out of the database and construct
Hi,
I noticed the first index update after I restart my jboss server always
fail with the exception below. Any update after that works fine. Does
anyone know what the problem is? The solr version I'm using is solr1.2
Thanks
Xuesong
2007-08-09 11:41:44,559 ERROR [STDERR] Aug 9, 2007 11:41:44 AM
Hi -
Just looking at synonyms, and had a couple of questions.
1) For some of my synonyms, it seems to make senses to simply replace the
original word with the other (e.g. theatre = theater, so searches for
either will find either). For others, I want to add an alternate term while
preserving the
On 9-Aug-07, at 7:52 AM, Ard Schrijvers wrote:
ulimit -n 8192
Unless you have an old, creaky box, I highly recommend simply upping
your filedesc cap.
-Mike
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote:
Hmm.. I just tried the following three queries...
/?q=department_exact:ApparelMen's?
ApparelJackets*fq=country_code:USfq=brand_exact:adidas...
(no results)
/?q=department_exact:ApparelMen's\
Here you go.. I thought that string wasn't munged, so I used that...
field name=department type=text indexed=true stored=true/
field name=department_exact type=string indexed=true
stored=true/
copyField source=department dest=department_exact/
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote:
Here you go.. I thought that string wasn't munged, so I used that...
field name=department type=text indexed=true stored=true/
field name=department_exact type=string indexed=true
stored=true/
copyField source=department dest=department_exact/
Is this a native feature, or do we need to get creative with scp from
one server to the other?
If it's a contention between search and indexing, separate them
via a query-slave and an index-master.
--cw
On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote:
Python script queries the mysql DB then calls bash script
Bash script performs a curl POST submit to solr
For the most up-to-date solr client for python, check out
https://issues.apache.org/jira/browse/SOLR-216
-Yonik
Yes, we've reindexed several times. Here are three sample result sets..
1 - ?q=department_exact:ApparelMen's?
ApparelJackets*fq=country_code:USfq=brand_exact:adidas
2 - ?q=department_exact:ApparelMen's\
ApparelJackets*fq=country_code:USfq=brand_exact:adidas
3 -
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote:
Yes, we've reindexed several times. Here are three sample result sets..
1 - ?q=department_exact:ApparelMen's?
ApparelJackets*fq=country_code:USfq=brand_exact:adidas
2 - ?q=department_exact:ApparelMen's\
On 8/9/07, Yonik Seeley [EMAIL PROTECTED] wrote:
They translate to different queries.
But can I see the XML output for 1 and 2 with debugQuery=onindent=on
appended?
Or perhaps with wt=python would be less confusing seeing that there
are '' chars in there that would otherwise be escaped.
Sure thing!
Heres 1, and 2.
1 - just a space.
2 - a \ .
++
| Matthew Runo
| Zappos Development
| [EMAIL PROTECTED]
| 702-943-7833
++
On Aug 9, 2007, at 1:14 PM, Yonik Seeley
Hm, I don't see any attachments, I'm forwarding them to you directly.
Would anyone else like to see them?
++
| Matthew Runo
| Zappos Development
| [EMAIL PROTECTED]
| 702-943-7833
Feel free to run some queries yourself. We opened the firewall for
this box...
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's\%20Apparel%
3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python
++
|
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote:
Feel free to run some queries yourself. We opened the firewall for
this box...
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's\%20Apparel%
3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python
OK, so this query
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's%20Apparel%
3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python
The same exact query, with... wait..
Wow. I'm making myself look like an idiot.
I swear that these queries didn't work the first time I ran them...
On 8/9/07, Matthew Runo [EMAIL PROTECTED] wrote:
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's%20Apparel%
3EJackets*fq=country_code:USfq=brand_exact:adidaswt=python
The same exact query, with... wait..
Wow. I'm making myself look like an idiot.
I swear that
Hi all, I'd like to provide a blurb of documents matching a search in
the case when there is no text highlighted. I assumed that perhaps the
highlighter would give me back the first few words in a document if this
occurred, but it doesn't. My conundrum is that I'd rather not grab the
whole
On 8/9/07, Benjamin Higgins [EMAIL PROTECTED] wrote:
Hi all, I'd like to provide a blurb of documents matching a search in
the case when there is no text highlighted. I assumed that perhaps the
highlighter would give me back the first few words in a document if this
occurred, but it doesn't.
On 9-Aug-07, at 2:10 PM, Benjamin Higgins wrote:
Hi all, I'd like to provide a blurb of documents matching a search in
the case when there is no text highlighted. I assumed that perhaps
the
highlighter would give me back the first few words in a document if
this
occurred, but it doesn't.
This may be obvious but I can't get my head straight. Is there a way
to return a list of matching words that a record got matched against?
For instance:
record_a: ruby, solr, mysql, rails
record_b: solr, java
Then ?q=solr+OR+rails would return the matched words for the records
record_a: solr,
I'm adding a field to be the source of the spellcheck database. Since that
is its only job, it has raw text lower-cased, de-Latin1'd, and
de-duplicated.
Since it is only for the spellcheck DB, it does not need to keep duplicates.
I specified it as 'multiValued=false and used copyField from a
Thanks Mike. I didn't think of creating a blurb beforehand, but that's
a great solution. I'll probably do that. Yonik, I can still add a JIRA
issue if you'd like, though.
Ben
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 09, 2007 2:32 PM
To:
Hi again,
It'd be nice to know what the starting line number is for highlighted
snippets. I imagine others might find it useful to know the starting
byte offset. Is there an easy way to add this in? I'm not afraid of
hacking the source if it's not too involved.
Thanks.
Ben
Hi,
I have built 2 solr instance - one is example and the other is
ca_companies.
The ca_companies solr instance is working find, but example is not
working...
In the admin page, /solr/admin, for example instance, it shows that
Cwd=/rpt/src/apache-solr-1.2.0/ca_companies/solr/conf
http://wiki.apache.org/solr/EmbeddedSolr
Following the example on connecting to the Index directly without using
HTTP, I tried to optimize by passing the true flag to the
CommitUpdateCommand.
When optimizing an index with Lucene directly it doubles the size of the
index temporarily and then
Here are the Catalina/localhost/ files
For example instance
Context docBase=/rpt/src/apache-solr-1.2.0/dist/solr.war
debug=0 crossContext=true
Environment name=solr/home type=java.lang.String
value=/rpt/src/apache-solr-1.2.0/example/solr
override=true /
/Context
For ca_companies
On 8/9/07, Thiago Jackiw [EMAIL PROTECTED] wrote:
This may be obvious but I can't get my head straight. Is there a way
to return a list of matching words that a record got matched against?
Unfortunately no... lucene doesn't provide that capability with
standard queries.
You could do it (slower)
Jython is a Python interpreter implemented in Java. (I have a lot of Python
code.)
Total throughput in the servlet is very sensitive to the total number of
servlet sockets available v.s. the number of CPUs.
The different analysers have very different performance.
You might leave some data in
On 8/9/07, Lance Norskog [EMAIL PROTECTED] wrote:
I'm adding a field to be the source of the spellcheck database. Since that
is its only job, it has raw text lower-cased, de-Latin1'd, and
de-duplicated.
Since it is only for the spellcheck DB, it does not need to keep duplicates.
Duplicate
The current working directory (Cwd) is the directory from which you started
the Tomcat server and is not dependent on the Solr instance configurations.
So as long as SolrHome is correct for each Solr instance, you shouldn't have
a problem.
cheers,
Piete
On 10/08/07, Jae Joo [EMAIL PROTECTED]
It should probably be configurable: (1) return nothing if no match, (2)
substitute with an alternate field, (3) return first sentence or N
number of tokens.
-Sean
Yonik Seeley wrote on 8/9/2007, 5:50 PM:
On 8/9/07, Benjamin Higgins [EMAIL PROTECTED] wrote:
Thanks Mike. I didn't think of
On Thu, 9 Aug 2007 15:23:03 -0700
Lance Norskog [EMAIL PROTECTED] wrote:
Underlying this all, you have a sneaky network performance problem. Your
successive posts do not reuse a TCP socket. Obvious: re-opening a new socket
each post takes time. Not obvious: your server has sockets building up
If we have a field spellcheck_db, and have two copyField lines for it:
fieldType name=spellcheck ... Basically the text type without
stemming...
field name=title type=string /
field name=description type=string /
field name=spellcheck_db multiValued=false
Good day,
danc86 of #lucene gave me the answer - I was not storing the fields :-)
Thanks,
Franz
On 8/9/07, Ryan McKinley [EMAIL PROTECTED] wrote:
[QUESTION]
What could be the problem? .Or what else can I do to debug this problem?
In general 'luke' is a great tool to figure out
Maybe there's a different way, in which path-like values like this are
treated explicitly.
I use a similar approach to Matthew at www.colfes.com, where all pages are
generated from Lucene searches according to filters on a couple of
hierarchical categories ('spaces'), i.e. subject and
51 matches
Mail list logo