works at
all.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562
From: climbingrose climbingr...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Fri, 17 Apr 2009 16:53:00 +1000
To: solr-user solr-user@lucene.apache.org
Subject: CollapseFilter
Hi all,
I've been trying to return a field of type ExternalFileField in the search
result. Upon examining XMLWriter class, it seems like Solr can't do this out
of the box. Therefore, I've tried to hack Solr to enable this behaviour.
The goal is to call to
Hi Yonik,
I have had a looked at ExternalFileField. However, I coudn't figured out how
to include the externally referenced field in the search results. Also,
sorting on this type of field isn't possible right?
Thanks.
On Sat, Jul 12, 2008 at 2:28 AM, climbingrose [EMAIL PROTECTED]
wrote
Hi all,
Has anyone tried to factor rating/popularity into Solr scoring? For example,
I want documents with more page views to be ranked higher in the search
results. From what I can see, the most difficult thing is that we have to
update the number of page views for each document. With Solr-139,
, Jul 12, 2008 at 1:58 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
See ExternalFileField and BoostedQuery
-Yonik
On Fri, Jul 11, 2008 at 11:47 AM, climbingrose [EMAIL PROTECTED]
wrote:
Hi all,
Has anyone tried to factor rating/popularity into Solr scoring? For
example,
I want documents
You do, I think. Have a look at DirectUpdateHandler2 class.
On Thu, Jul 10, 2008 at 9:16 PM, Gudata [EMAIL PROTECTED] wrote:
Hi,
I want (if possible) to dedicate one machine only for indexing and to be
optimized only for that.
In solrconfig.xml, I have:
- commented all cache statements
-
, write something similar to
EnglishPorterFilterFactory to use it within Solr.
Hope this helps.
Cheers,
Cuong
On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet [EMAIL PROTECTED]
wrote:
Hi Cuong,
On Tue, Jul 1, 2008 at 4:45 AM, climbingrose [EMAIL PROTECTED]
wrote:
I modified the original
Hi all,
Porter stemmer in general is really good. However, there are some cases
where it doesn't work. For example, accountant matches Accountant as
well as Account Manager which isn't desirable. Is it possible to use this
analyser for plural words only? For example:
+Accountant - accountant
Ok, it looks like step 1a in Porter algo does what I need.
On Mon, Jun 30, 2008 at 6:39 PM, climbingrose [EMAIL PROTECTED]
wrote:
Hi all,
Porter stemmer in general is really good. However, there are some cases
where it doesn't work. For example, accountant matches Accountant as
well
AM, Mike Klaas [EMAIL PROTECTED] wrote:
If you find a solution that works well, I encourage you to contribute it
back to Solr. Plural-only stemming is probably a common need (I've
definitely wanted to use it before).
cheers,
-Mike
On 30-Jun-08, at 2:25 AM, climbingrose wrote:
Ok
interesting but the implementation is in Python though. I think they use
Hidden Markov Model to label training data then matching records
probalistically.
On Fri, Jun 27, 2008 at 10:12 PM, Grant Ingersoll [EMAIL PROTECTED]
wrote:
below
On Jun 27, 2008, at 1:18 AM, climbingrose wrote:
Firstly
It depends on your query. The second query is better if you know that
fieldb:bar filtered query will be reused often since it will be cached
separately from the query. The first query occuppies one cache entry while
the second one occuppies two cache entries, one in queryCache and one in
Just correct myself, in the last setence, the first query is better if
fieldb:bar isn't reused often
On Thu, Jun 12, 2008 at 2:02 PM, climbingrose [EMAIL PROTECTED]
wrote:
It depends on your query. The second query is better if you know that
fieldb:bar filtered query will be reused often since
Hi Sachit,
I think what you could do is to create all the core fields of your models
such as username, role, title, body, images... You can name them with prefix
like user.username, user.role, article.title, article.body... If you want to
dynamically add more fields to your schema, you can use
Hi all,
I'm trying to implement sponsored results in Solr search results similar
to that of Google. We index products from various sites and would like to
allow certain sites to promote their products. My approach is to query a
slave instance to get sponsored results for user queries in addition
it and let you decide.
I have an index containing products entries that I created a field called
sponsored words. What I do is to boost this field , so when these words are
matched in the query that products appear first on my result.
2008/6/3 climbingrose [EMAIL PROTECTED]:
Hi all,
I'm trying
Hi Matthias,
How would you prevent Solr server from being exposed to outside world with
this javascript client? I prefer running Solr behind firewall and access it
from server side code.
Cheers.
On Mon, May 26, 2008 at 7:27 AM, Matthias Epheser [EMAIL PROTECTED]
wrote:
Hi users,
As
Probably the easiest way to do this is keep track of the number of items
yourself then retrieve it later on.
On Wed, May 21, 2008 at 7:57 AM, Brian Whitman [EMAIL PROTECTED]
wrote:
Any way to query how many items are in a multivalued field? (Or use a
functionquery against that # or anything?)
Agree. I've been using Solrj on product site for 9 months without any
problem at all. You should probably give it a try instead of dealing with
all those low level details.
On Sun, May 11, 2008 at 4:14 AM, Chris Hostetter [EMAIL PROTECTED]
wrote:
: please post a snippet of Java code to add a
Thanks Christ. I probably have to repost this in Lucene mailing list.
On Sun, Mar 23, 2008 at 9:49 AM, Chris Hostetter [EMAIL PROTECTED]
wrote:
the topic has come up before on the lucene java lists (allthough i can't
think of any good search terms to find the old threads .. I can't really
Hi all,
I thought many people would encounter the situation I'm having here.
Basically, we'd like to have a PhraseQuery with minimum should match
property similar to BooleanQuery. Consider the query Senior Java
Developer:
1) I'd like to do a PhraseQuery on Senior Java Developer with a slop of
Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: [EMAIL PROTECTED]
~ The code is willing, but the data is weak. ~
-Original Message-
From: climbingrose [mailto:[EMAIL PROTECTED]
Sent
Hi guys,
I'm running to some problems with accented (UTF-8) language. I'd love to
hear some ideas about how to use Solr with those languages. Basically, I
want to achieve what Google did with UTF-8 language.
My requirements including:
1) Accent insensitive search and proper highlighting:
For
I don't think they (Solr developers) have a time frame for 1.3 release.
However, I've been using the latest code from the trunk and I can tell you
it's quite stable. The only problem is the documentation sometimes doesn't
cover lastest changes in the code. You'll probably have to dig into the code
I'm using code pulled directly from Subversion.
On Jan 21, 2008 12:34 PM, anuvenk [EMAIL PROTECTED] wrote:
Thanks. Would this be the latest code from the trunk that you mentioned?
http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip
climbingrose wrote:
I don't think
Good day all Solr users developers,
May I wish you and your family a merry Xmas and happy new year. Hope that
new year brings you all health, wealth and peace. It's been my pleasure to
be on this mailing list and working with Solr. Thank you all!
--
Cheers,
Cuong Hoang
Make sure that the user running Solr has permission to execute snapshooter.
Also, try ./snapshooter instead of snapshooter.
Good luck.
On Dec 18, 2007 10:57 AM, Sunny Bassan [EMAIL PROTECTED] wrote:
I've set up solrconfig.xml to create a snap shot of an index after doing
a optimize, but the
I think there is a event listener interface for hooking into Solr events
such as post commit, post optimise and open new searcher. I can't remember
on top of my head but if you do a search for *EventListener in Eclipse,
you'll find it.
The Wiki shows how to trigger snapshooter after each commit
Hi Ryan,
I'm using solr with Maven 2 in our project. Here is how my pom.xml looks
like:
!-- Solrj --
dependency
groupIdorg.apache.solr/groupId
artifactIdsolr-solrj/artifactId
version1.3.0/version
/dependency
Since I have all solrj
I don't think you have to. Just try the query on the REST interface and you
will know.
On Dec 5, 2007 9:56 AM, Kasi Sankaralingam [EMAIL PROTECTED] wrote:
Do I need to select the fields in the query that I am trying to sort on?,
for example if I want sort on update date then do I need to
Hi all,
I'm trying to implement a custom UpdateProcessor which requires access to
SolrIndexSearcher. However, I'm constantly running into Too many open
files exception. I'm confused about which is the correct way to get access
to SolrIndexSearcher in UpdateProcessor:
1) req.getSearcher()
2)
Assuming that you have the timestamp field defined:
q=*:*sort=timestamp desc
On Nov 23, 2007 10:43 PM, Thorsten Scherler
[EMAIL PROTECTED] wrote:
Hi all,
I need to ask solr to return me the id of the last committed document.
Is there a way to archive this via a standard lucene query or do I
The duplication detection mechanism in Nutch is quite primitive. I
think it uses a MD5 signature generated from the content of a field.
The generation algorithm is described here:
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/crawl/TextProfileSignature.html.
The problem with this
Make sure you have JDK installed not just JRE. Also try to set
JAVA_HOME directory.
apt-get install sun-java5-jdk
On Nov 21, 2007 5:50 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Phillip,
I won't go into details, but I'll point out that the Java compiler is called
javac and if memory
Hi Ken,
It's correct that uncommon words are most likely not showing up in the
signature. However, I was trying to say that if two documents has 99%
common tokens and differ in one token with frequency quantised
frequency, the two resulted hashes are completely different. If you
want true near
Hi David,
Do you use one of Solr client available
http://wiki.apache.org/solr/IntegratingSolr? These clients should
probably have done all the XML parsing jobs for you. I speak from
Solrj experience.
IMO, your approach is probably most commonly used when it comes to
pagination. Solr caching
One approach is to extend SynonymFilter so that it reads synonyms from
database instead of a file. SynonymFilter is just a Java class so you
can do whatever you want with it :D. From what I remember, the filter
initialises a list of all input synonyms and store them in memory.
Therefore, you need
The easiest solution I know is:
deletequeryid:1 OR id:2 OR .../query/delete
If you know that all of these ids can be found by issuing a query, you
can do delete by query:
deletequeryYOUR_DELETE_QUERY_HERE/query/delete
Cheers
On Nov 19, 2007 4:18 PM, Norberto Meijome [EMAIL PROTECTED] wrote:
Hi
Hi all,
I've been so busy the last few days so I haven't replied to this email. I
modified SpellCheckerHandler a while ago to include support for multiword
query. To be honest, I didn't have time to write unit test for the code.
However, I deployed it in a production environment and it has been
configurable.
On 10/11/07, climbingrose [EMAIL PROTECTED] wrote:
Hi all,
I've been so busy the last few days so I haven't replied to this email. I
modified SpellCheckerHandler a while ago to include support for multiword
query. To be honest, I didn't have time to write unit test for the code
1)On solr.master:
+Edit scripts.conf:
solr_hostname=localhost
solr_port=8983
rsyncd_port=18983
+Enable and start rsync:
rsyncd-enable; rsyncd-start
+Run snapshooter:
snapshooter
After running this, you should be able to see a new folder named snapshot.*
in data/index folder.
You can can
De: [EMAIL PROTECTED]
A: solr-user@lucene.apache.org
Sujet: Re: Re: Solr replication
Date: Mon, 1 Oct 2007 15:00:46 +0200
Works like a charm. Thanks very much.
cheers
Y.
Message d'origine
Date: Mon, 1 Oct 2007 21:55:30 +1000
De: climbingrose
A: solr-user
I don't think you can with the current Solr because each instance runs in a
separate web app.
On 9/25/07, James liu [EMAIL PROTECTED] wrote:
if use multi solr with one index, it will cache individually.
so i think can it share their cache.(they have same config)
--
regards
jl
--
Hi all,
I've been struggling to find a good way to synchronize Solr with a large
number of records. We collect our data from a number of sources and each
source produces around 50,000 docs. Each of these document has a sourceId
field indicating the source of the document. Now assuming we're
Hi Erik,
So in your case #1, documents are reindexed with this scheme - so if you
truly need to skip a reindexing for some reason (why, though?) you'll
need to come up with some other mechanism. [perhaps update could be
enhanced to allow ignoring a duplicate id rather than reindexing?]
It's
I think you can use the CollapseFilter to collapse on version field.
However, I think you need to modify the CollapseFilter code to sort by
version and get the latest version returned.
On 9/13/07, Adrian Sutton [EMAIL PROTECTED] wrote:
Hi all,
The document's we're indexing are versioned and
using persistent http connections? Are you threadedly indexing?
cheers,
-Mike
Paul Sundling
-Original Message-
From: climbingrose [mailto:[EMAIL PROTECTED]
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50
Thanks Karl. I'll check it out!
On 8/18/07, karl wettin [EMAIL PROTECTED] wrote:
I updated LUCENE-626 last night. It should now run smooth without
LUCENE-550, but smoother with.
Perhaps it is something you can use.
12 aug 2007 kl. 14.24 skrev climbingrose:
I'm happy to contribute code
://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
.
On 8/11/07, Pieter Berkel [EMAIL PROTECTED] wrote:
On 11/08/07, climbingrose [EMAIL PROTECTED] wrote:
The spellchecker handler doesn't seem to work with multi-word query. For
example, when I
I'm having the date boosting function as well. I'm using this function:
F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around
10,000 of documents added in one day, rord(createDate) returns very
different values for the same createDate. For example, the last document
added with
Yeah. How stable is the patch Karl? Is it possible to use it in product
environment?
On 8/12/07, karl wettin [EMAIL PROTECTED] wrote:
11 aug 2007 kl. 10.36 skrev climbingrose:
There is an issue on
Lucene issue tracker regarding multi-word spellchecker:
https://issues.apache.org/jira
know if this is still not clear, I probably will
add
it to the wiki page soon.
cheers,
Tristan
On 7/9/07, climbingrose [EMAIL PROTECTED] wrote:
Thanks for the quick reply. However, I'm still not able to setup
spellchecker. Solr does create spell directory under data
irrelevant suggestions for the location
part since the number of terms in location is generally much smaller
compared with that of description. Any ideas?
Thanks.
On 8/11/07, climbingrose [EMAIL PROTECTED] wrote:
The spellchecker handler doesn't seem to work with multi-word query. For
example
OK, I just need to define 2 spellcheckers in solrconfig.xml for my purpose.
On 8/11/07, climbingrose [EMAIL PROTECTED] wrote:
After looking the SpellChecker code, I realised that it only supports
single-word. I made a very naive modification of SpellCheckerHandler to get
multi-word support
Hi all,
I think there might be something wrong with the date time rounding up. I
tried this query: q=*:*fq=listedDate:[NOW/DAY-1DAY TO *] which I think
should return results since yesterday. So if today is 9th of August, it
should return all results from the 8th of August. However, Solr returns
I think I have the same question as Arnaud. For example, my dismax query has
qf=title^5 description^2. Now if I search for Java developer, I want to
make sure that the results have at least java or developer in the title.
Is this possible with dismax query?
On 7/30/07, Chris Hostetter [EMAIL
Hi all,
I'm puzzling over how to boost a date field in a DisMax query. Atm, my qf is
title^5 summary^1. However, what I really want to do is to allow document
with latest listedDate to have better score. For example, documents with
listedDate:[NOW-1DAY TO *] have additional score over documents
] wrote:
I think in this case you can use a bq (Boost Query) so you can apply
this
boost to the range you want.
str name=bqyour_date_field:[NOW/DAY-24HOURS TO NOW]^10.0/str
This example will boost your documents with date within the last 24h.
Regards,
Daniel
On 19/7/07 14:45, climbingrose
Just tried the bq approach and it works beautifully. Exactly what I was
looking for. Still, I'd like to know which approach is the preferred? Thanks
again guys.
On 7/20/07, climbingrose [EMAIL PROTECTED] wrote:
Thanks for both answers. Which one is better in terms of performance? bq
or bf
Thanks for the answer Chris. The DisMax query handler is just amazing!
On 7/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
: Just tried the bq approach and it works beautifully. Exactly what I was
: looking for. Still, I'd like to know which approach is the preferred?
Thanks
: again guys.
i
Hi all,
My facet browsing performance has been decent on my system until I add my
custom Analyser. Initially, I facetted title field which is of default
string type (no analysers, tokenisers...) and got quick responses (first
query is just under 1s, subsequent queries are 0.1s). I created a
to 100 if you have the memory
Optimizing your index should also speed up faceting (but that is a lot
of facets).
-Yonik
On 7/16/07, climbingrose [EMAIL PROTECTED] wrote:
Hi all,
My facet browsing performance has been decent on my system until I add
my
custom Analyser. Initially, I facetted
, climbingrose [EMAIL PROTECTED] wrote:
Thanks Yonik. In my case, there is only one title field per document
so is
there a way to force Solr to work the old way? My analyser doesn't break
up
the title field into multiple tokens. It only tries to format the
field
value (to lower case, remove unwanted
Thanks for the suggestion Chris. I modified SimpleFacets to check for
[f.foo.]facet.field.type==(single|multi)
and the performance has been improved significantly.
On 7/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:
: ...but i don't understand why both checking isTokenized() ...
shouldn't
:
Hi all,
I've been using Solr for the last few projects and the experience has been
great. I'll post the link to the website once it finishes. Just have a few
questions regarding synonyms and parameters encoding:
1) Is multi-word synonyms possible now in Solr? For example, can I have
things like
Accidentally I have a very similar use case. Thanks for advice.
On 7/8/07, Yonik Seeley [EMAIL PROTECTED] wrote:
On 7/7/07, Brian Whitman [EMAIL PROTECTED] wrote:
I have been trying to plan out a history function for Solr. When I
update a document with an existing unique key, I would like
Thanks Yonik. I think both of the conditions hold true for our application
;).
On 3/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
On 3/26/07, climbingrose [EMAIL PROTECTED] wrote:
I'm developing an application that potentially creates thousands of
dynamic
fields. Does anyone know if large
Hi all,
I'm developing an application that potentially creates thousands of dynamic
fields. Does anyone know if large number of dynamic fields will degrade
Solr performance?
Thanks.
--
Regards,
Cuong Hoang
Hi all,
Is it true that Solr is mainly used for applications that rarely change the
underlying data? As I understand, if you submit new data or modify existing
data on Solr server, you would have to refresh the cache somehow to
display the updated data. If my application frequently gets new
Hi all,
Am I right that we can only have one schema per solr server? If so, how
would you deal with the issue of submitting completely different data models
(such as clothes and cars)?
Thanks.
--
Regards,
Cuong Hoang
I probably need to visualise my models:
MobileInfo (1)(1...*) SellingItem
MobileInfo has many fields to describe the characteristics of a mobile phone
model (color, size..). SellingItem is an instance of MobileInfo that is
currently sold by a user. So in the
71 matches
Mail list logo