Hi all,
I've been struggling to find a good way to synchronize Solr with a large
number of records. We collect our data from a number of sources and each
source produces around 50,000 docs. Each of these document has a sourceId
field indicating the source of the document. Now assuming we're
Hi,
I am new to Solr and am t
--
Blog @ http://blizzardzblogs.blogspot.com
Hi,
I am new to Solr and am trying to implementing a solution for indexing and
searching using Embedded Solr.
However, i have a query w.r.t SolrSchema :
How do i generate the schema fields programatically, instead of defining
them in the schema.xml ?
Regards,
Venkat
[apologies for sending a WIP
Cuong,
I accomplished (in Collex) by attaching a batch number to each
document. When indexing a batch (or source), a GUID is generated and
every document from that batch/source gets that same identifier
attached to it. At the end of the indexing run, I delete everything
with that
Hi there!
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application consists of
approximatly 1.4 million records of varying size for the work record,
and another database of
Hi Erik,
So in your case #1, documents are reindexed with this scheme - so if you
truly need to skip a reindexing for some reason (why, though?) you'll
need to come up with some other mechanism. [perhaps update could be
enhanced to allow ignoring a duplicate id rather than reindexing?]
It's
Hi,
What methods are available for user authentication? I'm using Jetty and
php/curl and Basic HTTP Auth does not seem to work. I just need something
simple so that only the Admin can add, update or delete documents.
Regards,
Jennifer Seaman
--
View this message in context:
Add/Update, Commit/Optimize, Delete, and Delete by query, in Solr are done
using the url /update. So should be able to protect that url at the
container level outside of Solr. If you want you can protect the query url
/select or the admin pages too. Container level authentication is
transparent
When you say outside of Solr do you mean outside of solr.war? We finally
got php/curl working with jetty's Basic Authentication. We had to unpack and
repack solr.war to edit web.xml and it would have been nice to use some
other method.
--
View this message in context:
Hi,
oops, the URIEncoding was lost during the update to tomcat 6.0.14.
Thanks for the advice.
But now I am really curioused. After indexing the document from scratch,
I have the effect that queries to this and is work fine, whereas
queries to really and fünny do not return the result. Fünnily
In the wiki:
http://wiki.apache.org/solr/HighlightingParameters#head-23ecd5061bc2c86a
561f85dc1303979fe614b956
where it talks about the hl.snippets parameter, it says that it can be
overridden on a per-field basis. I haven't been able to find any
information in the documentation or on the
On Sep 14, 2007, at 12:33 PM, Nathaniel E. Powell wrote:
http://wiki.apache.org/solr/
HighlightingParameters#head-23ecd5061bc2c86a
561f85dc1303979fe614b956
where it talks about the hl.snippets parameter, it says that it can be
overridden on a per-field basis. I haven't been able to find any
On Sep 14, 2007, at 8:19 AM, Thompson,Roger wrote:
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application
consists of
approximatly 1.4 million records of varying size
On 14-Sep-07, at 5:19 AM, Thompson,Roger wrote:
Hi there!
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application
consists of
approximatly 1.4 million records of
Hi Marc,
Are you using the same stemmer on your queries that you use when indexing?
Try the analysis function in the admin UI, to see how things are stemmed for
indexing vs. querying. If they don't match for really and fünny, and do
match for kraßen, then that's your problem.
Tom
On 9/14/07,
Hi Tom,
thanks for your response -- and sorry for the newbie question, may sound
somehow silly ;-) . Here the quick result of the analysis UI:
Index for really: 5* really. Query for really: 5* really, 2* realli
(from: EnglishPorterFilterFactory {protected=protwords.txt},
Hi Marc,
The searches are going to look for an exact match of the query (after
analysis) in the index (after analysis).
So, realli will not match really.
So you want to have the same stemmer (probably not the English one, given
your examples) in both in index analyzer, and the query analyzer.
I meant outside of the Solr code. You are right that it is still in the
Solr war file since you will need to put the authentication configuration
into web.xml.
Bill
On 9/14/07, jenix [EMAIL PROTECTED] wrote:
When you say outside of Solr do you mean outside of solr.war? We finally
got
Hi Tom,
thanks for your professional response -- works fine and looks good :-).
Since I am playing around with mixed texts (English and German), I do
not have any idea whether or not an EnglishPorter will be useful for
German texts. But I will find it out by playing around ;-)
Regards from
You can try the public/private key certficate system. You deploy it to
jetty/tomcat somehow, and curl has options to send it.
We haven't tried this. The authentication happens at the http container
level, not in the solr config.
-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED]
You could use index into multiple fields with different analyzers
and search all of them.
text_en: uses English stemmer
text_de: uses German stemmer
text_exact: no stemming
text_strip: uses ISOLatin1AccentFilter
You can search all of these and put different boosts on them,
with higher boosts for
I apologize for missing that. I added an anchor at the top and a link
where the word overrides is in the wiki.
Thanks,
-Nathan
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, September 14, 2007 10:53 AM
To: solr-user@lucene.apache.org
Subject: Re:
: When you say outside of Solr do you mean outside of solr.war? We finally
: got php/curl working with jetty's Basic Authentication. We had to unpack and
: repack solr.war to edit web.xml and it would have been nice to use some
: other method.
it should not be neccessary to unpack the war ... you
Hi Mike,
Thanks for clarifying what has been a bit of a black box to me.
A couple of questions, to increase my understanding, if you don't mind.
If I am only using fields with multiValued=false, with a type of string
or integer (untokenized), does solr automatically use approach 2? Or is
this
On 14-Sep-07, at 3:38 PM, Tom Hill wrote:
Hi Mike,
Thanks for clarifying what has been a bit of a black box to me.
A couple of questions, to increase my understanding, if you don't
mind.
If I am only using fields with multiValued=false, with a type of
string
or integer (untokenized),
Hi,
I am not sure if this can be done. Let's say if periodically there is a
big batch to be indexed and we don't want to replicate the data befor
the batch is completely indexed. We would like to avoid post commit
hook as we will be periodically committing to reduce the memory usage
and we
: number of records. We collect our data from a number of sources and each
: source produces around 50,000 docs. Each of these document has a sourceId
: field indicating the source of the document. Now assuming we're indexing all
: documents from SourceA (sourceId=SourceA), majority of these docs
You could MD4 the parts you care about, store that, fetch it and compare.
If there is a reliable timestamp, you could use that. But that would be
app-dependent.
In general, you need to store some info about each source document
and figure out whether it is new. This get much hairier with a web
28 matches
Mail list logo