Re: CollapseFilter with the latest Solr in trunk

2009-04-19 Thread climbingrose
works at all. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: climbingrose climbingr...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Fri, 17 Apr 2009 16:53:00 +1000 To: solr-user solr-user@lucene.apache.org Subject: CollapseFilter

Best way to return ExternalFileField in the results

2008-07-15 Thread climbingrose
Hi all, I've been trying to return a field of type ExternalFileField in the search result. Upon examining XMLWriter class, it seems like Solr can't do this out of the box. Therefore, I've tried to hack Solr to enable this behaviour. The goal is to call to

Re: Document rating/popularity and scoring

2008-07-14 Thread climbingrose
Hi Yonik, I have had a looked at ExternalFileField. However, I coudn't figured out how to include the externally referenced field in the search results. Also, sorting on this type of field isn't possible right? Thanks. On Sat, Jul 12, 2008 at 2:28 AM, climbingrose [EMAIL PROTECTED] wrote

Document rating/popularity and scoring

2008-07-11 Thread climbingrose
Hi all, Has anyone tried to factor rating/popularity into Solr scoring? For example, I want documents with more page views to be ranked higher in the search results. From what I can see, the most difficult thing is that we have to update the number of page views for each document. With Solr-139,

Re: Document rating/popularity and scoring

2008-07-11 Thread climbingrose
, Jul 12, 2008 at 1:58 AM, Yonik Seeley [EMAIL PROTECTED] wrote: See ExternalFileField and BoostedQuery -Yonik On Fri, Jul 11, 2008 at 11:47 AM, climbingrose [EMAIL PROTECTED] wrote: Hi all, Has anyone tried to factor rating/popularity into Solr scoring? For example, I want documents

Re: Do I need Searcher on indexing machine

2008-07-10 Thread climbingrose
You do, I think. Have a look at DirectUpdateHandler2 class. On Thu, Jul 10, 2008 at 9:16 PM, Gudata [EMAIL PROTECTED] wrote: Hi, I want (if possible) to dedicate one machine only for indexing and to be optimized only for that. In solrconfig.xml, I have: - commented all cache statements -

Re: Limit Porter stemmer to plural stemming only?

2008-07-01 Thread climbingrose
, write something similar to EnglishPorterFilterFactory to use it within Solr. Hope this helps. Cheers, Cuong On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet [EMAIL PROTECTED] wrote: Hi Cuong, On Tue, Jul 1, 2008 at 4:45 AM, climbingrose [EMAIL PROTECTED] wrote: I modified the original

Limit Porter stemmer to plural stemming only?

2008-06-30 Thread climbingrose
Hi all, Porter stemmer in general is really good. However, there are some cases where it doesn't work. For example, accountant matches Accountant as well as Account Manager which isn't desirable. Is it possible to use this analyser for plural words only? For example: +Accountant - accountant

Re: Limit Porter stemmer to plural stemming only?

2008-06-30 Thread climbingrose
Ok, it looks like step 1a in Porter algo does what I need. On Mon, Jun 30, 2008 at 6:39 PM, climbingrose [EMAIL PROTECTED] wrote: Hi all, Porter stemmer in general is really good. However, there are some cases where it doesn't work. For example, accountant matches Accountant as well

Re: Limit Porter stemmer to plural stemming only?

2008-06-30 Thread climbingrose
AM, Mike Klaas [EMAIL PROTECTED] wrote: If you find a solution that works well, I encourage you to contribute it back to Solr. Plural-only stemming is probably a common need (I've definitely wanted to use it before). cheers, -Mike On 30-Jun-08, at 2:25 AM, climbingrose wrote: Ok

Re: Suggestion for short text matching using dictionary

2008-06-27 Thread climbingrose
interesting but the implementation is in Python though. I think they use Hidden Markov Model to label training data then matching records probalistically. On Fri, Jun 27, 2008 at 10:12 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: below On Jun 27, 2008, at 1:18 AM, climbingrose wrote: Firstly

Re: searching only within allowed documents

2008-06-11 Thread climbingrose
It depends on your query. The second query is better if you know that fieldb:bar filtered query will be reused often since it will be cached separately from the query. The first query occuppies one cache entry while the second one occuppies two cache entries, one in queryCache and one in

Re: searching only within allowed documents

2008-06-11 Thread climbingrose
Just correct myself, in the last setence, the first query is better if fieldb:bar isn't reused often On Thu, Jun 12, 2008 at 2:02 PM, climbingrose [EMAIL PROTECTED] wrote: It depends on your query. The second query is better if you know that fieldb:bar filtered query will be reused often since

Re: Multiple Schema File

2008-06-04 Thread climbingrose
Hi Sachit, I think what you could do is to create all the core fields of your models such as username, role, title, body, images... You can name them with prefix like user.username, user.role, article.title, article.body... If you want to dynamically add more fields to your schema, you can use

Ideas on how to implement sponsored results

2008-06-03 Thread climbingrose
Hi all, I'm trying to implement sponsored results in Solr search results similar to that of Google. We index products from various sites and would like to allow certain sites to promote their products. My approach is to query a slave instance to get sponsored results for user queries in addition

Re: Ideas on how to implement sponsored results

2008-06-03 Thread climbingrose
it and let you decide. I have an index containing products entries that I created a field called sponsored words. What I do is to boost this field , so when these words are matched in the query that products appear first on my result. 2008/6/3 climbingrose [EMAIL PROTECTED]: Hi all, I'm trying

Re: Announcement of Solr Javascript Client

2008-05-25 Thread climbingrose
Hi Matthias, How would you prevent Solr server from being exposed to outside world with this javascript client? I prefer running Solr behind firewall and access it from server side code. Cheers. On Mon, May 26, 2008 at 7:27 AM, Matthias Epheser [EMAIL PROTECTED] wrote: Hi users, As

Re: query for number of field entries in a multivalued field?

2008-05-23 Thread climbingrose
Probably the easiest way to do this is keep track of the number of items yourself then retrieve it later on. On Wed, May 21, 2008 at 7:57 AM, Brian Whitman [EMAIL PROTECTED] wrote: Any way to query how many items are in a multivalued field? (Or use a functionquery against that # or anything?)

Re: Simple Solr POST using java

2008-05-10 Thread climbingrose
Agree. I've been using Solrj on product site for 9 months without any problem at all. You should probably give it a try instead of dealing with all those low level details. On Sun, May 11, 2008 at 4:14 AM, Chris Hostetter [EMAIL PROTECTED] wrote: : please post a snippet of Java code to add a

Re: Minimum should match and PhraseQuery

2008-03-23 Thread climbingrose
Thanks Christ. I probably have to repost this in Lucene mailing list. On Sun, Mar 23, 2008 at 9:49 AM, Chris Hostetter [EMAIL PROTECTED] wrote: the topic has come up before on the lucene java lists (allthough i can't think of any good search terms to find the old threads .. I can't really

Minimum should match and PhraseQuery

2008-03-19 Thread climbingrose
Hi all, I thought many people would encounter the situation I'm having here. Basically, we'd like to have a PhraseQuery with minimum should match property similar to BooleanQuery. Consider the query Senior Java Developer: 1) I'd like to do a PhraseQuery on Senior Java Developer with a slop of

Re: Accented search

2008-03-11 Thread climbingrose
Services 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta Canada T6G 2J8 Phone: (780) 492-3743 Fax: (780) 492-9243 e-mail: [EMAIL PROTECTED] ~ The code is willing, but the data is weak. ~ -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent

Accented search

2008-03-10 Thread climbingrose
Hi guys, I'm running to some problems with accented (UTF-8) language. I'd love to hear some ideas about how to use Solr with those languages. Basically, I want to achieve what Google did with UTF-8 language. My requirements including: 1) Accent insensitive search and proper highlighting: For

Re: solr 1.3

2008-01-20 Thread climbingrose
I don't think they (Solr developers) have a time frame for 1.3 release. However, I've been using the latest code from the trunk and I can tell you it's quite stable. The only problem is the documentation sometimes doesn't cover lastest changes in the code. You'll probably have to dig into the code

Re: solr 1.3

2008-01-20 Thread climbingrose
I'm using code pulled directly from Subversion. On Jan 21, 2008 12:34 PM, anuvenk [EMAIL PROTECTED] wrote: Thanks. Would this be the latest code from the trunk that you mentioned? http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip climbingrose wrote: I don't think

Merry Christmas and happy new year

2007-12-24 Thread climbingrose
Good day all Solr users developers, May I wish you and your family a merry Xmas and happy new year. Hope that new year brings you all health, wealth and peace. It's been my pleasure to be on this mailing list and working with Solr. Thank you all! -- Cheers, Cuong Hoang

Re: Issues with postOptimize

2007-12-17 Thread climbingrose
Make sure that the user running Solr has permission to execute snapshooter. Also, try ./snapshooter instead of snapshooter. Good luck. On Dec 18, 2007 10:57 AM, Sunny Bassan [EMAIL PROTECTED] wrote: I've set up solrconfig.xml to create a snap shot of an index after doing a optimize, but the

Re: Replication hooks

2007-12-10 Thread climbingrose
I think there is a event listener interface for hooking into Solr events such as post commit, post optimise and open new searcher. I can't remember on top of my head but if you do a search for *EventListener in Eclipse, you'll find it. The Wiki shows how to trigger snapshooter after each commit

Re: solr + maven?

2007-12-05 Thread climbingrose
Hi Ryan, I'm using solr with Maven 2 in our project. Here is how my pom.xml looks like: !-- Solrj -- dependency groupIdorg.apache.solr/groupId artifactIdsolr-solrj/artifactId version1.3.0/version /dependency Since I have all solrj

Re: SOLR sorting - question

2007-12-04 Thread climbingrose
I don't think you have to. Just try the query on the REST interface and you will know. On Dec 5, 2007 9:56 AM, Kasi Sankaralingam [EMAIL PROTECTED] wrote: Do I need to select the fields in the query that I am trying to sort on?, for example if I want sort on update date then do I need to

Access to SolrIndexSearcher in UpdateProcessor

2007-12-02 Thread climbingrose
Hi all, I'm trying to implement a custom UpdateProcessor which requires access to SolrIndexSearcher. However, I'm constantly running into Too many open files exception. I'm confused about which is the correct way to get access to SolrIndexSearcher in UpdateProcessor: 1) req.getSearcher() 2)

Re: Get last updated/committed document

2007-11-23 Thread climbingrose
Assuming that you have the timestamp field defined: q=*:*sort=timestamp desc On Nov 23, 2007 10:43 PM, Thorsten Scherler [EMAIL PROTECTED] wrote: Hi all, I need to ask solr to return me the id of the last committed document. Is there a way to archive this via a standard lucene query or do I

Re: Near Duplicate Documents

2007-11-21 Thread climbingrose
The duplication detection mechanism in Nutch is quite primitive. I think it uses a MD5 signature generated from the content of a field. The generation algorithm is described here: http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/crawl/TextProfileSignature.html. The problem with this

Re: Help with Debian solr/jetty install?

2007-11-21 Thread climbingrose
Make sure you have JDK installed not just JRE. Also try to set JAVA_HOME directory. apt-get install sun-java5-jdk On Nov 21, 2007 5:50 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Phillip, I won't go into details, but I'll point out that the Java compiler is called javac and if memory

Re: Near Duplicate Documents

2007-11-21 Thread climbingrose
Hi Ken, It's correct that uncommon words are most likely not showing up in the signature. However, I was trying to say that if two documents has 99% common tokens and differ in one token with frequency quantised frequency, the two resulted hashes are completely different. If you want true near

Re: Pagination with Solr

2007-11-19 Thread climbingrose
Hi David, Do you use one of Solr client available http://wiki.apache.org/solr/IntegratingSolr? These clients should probably have done all the XML parsing jobs for you. I speak from Solrj experience. IMO, your approach is probably most commonly used when it comes to pagination. Solr caching

Re: Finding all possible synonyms for a word

2007-11-19 Thread climbingrose
One approach is to extend SynonymFilter so that it reads synonyms from database instead of a file. SynonymFilter is just a Java class so you can do whatever you want with it :D. From what I remember, the filter initialises a list of all input synonyms and store them in memory. Therefore, you need

Re: multiple delete by id in one delete command?

2007-11-18 Thread climbingrose
The easiest solution I know is: deletequeryid:1 OR id:2 OR .../query/delete If you know that all of these ids can be found by issuing a query, you can do delete by query: deletequeryYOUR_DELETE_QUERY_HERE/query/delete Cheers On Nov 19, 2007 4:18 PM, Norberto Meijome [EMAIL PROTECTED] wrote: Hi

Re: Spell Check Handler

2007-10-11 Thread climbingrose
Hi all, I've been so busy the last few days so I haven't replied to this email. I modified SpellCheckerHandler a while ago to include support for multiword query. To be honest, I didn't have time to write unit test for the code. However, I deployed it in a production environment and it has been

Re: Spell Check Handler

2007-10-11 Thread climbingrose
configurable. On 10/11/07, climbingrose [EMAIL PROTECTED] wrote: Hi all, I've been so busy the last few days so I haven't replied to this email. I modified SpellCheckerHandler a while ago to include support for multiword query. To be honest, I didn't have time to write unit test for the code

Re: Solr replication

2007-10-01 Thread climbingrose
1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can

Re: Re: Re: Solr replication

2007-10-01 Thread climbingrose
De: [EMAIL PROTECTED] A: solr-user@lucene.apache.org Sujet: Re: Re: Solr replication Date: Mon, 1 Oct 2007 15:00:46 +0200 Works like a charm. Thanks very much. cheers Y. Message d'origine Date: Mon, 1 Oct 2007 21:55:30 +1000 De: climbingrose A: solr-user

Re: can solr do it?

2007-09-25 Thread climbingrose
I don't think you can with the current Solr because each instance runs in a separate web app. On 9/25/07, James liu [EMAIL PROTECTED] wrote: if use multi solr with one index, it will cache individually. so i think can it share their cache.(they have same config) -- regards jl --

Synchronize large number of records with Solr

2007-09-14 Thread climbingrose
Hi all, I've been struggling to find a good way to synchronize Solr with a large number of records. We collect our data from a number of sources and each source produces around 50,000 docs. Each of these document has a sourceId field indicating the source of the document. Now assuming we're

Re: Synchronize large number of records with Solr

2007-09-14 Thread climbingrose
Hi Erik, So in your case #1, documents are reindexed with this scheme - so if you truly need to skip a reindexing for some reason (why, though?) you'll need to come up with some other mechanism. [perhaps update could be enhanced to allow ignoring a duplicate id rather than reindexing?] It's

Re: Searching Versioned Resources

2007-09-12 Thread climbingrose
I think you can use the CollapseFilter to collapse on version field. However, I think you need to modify the CollapseFilter code to sort by version and get the latest version returned. On 9/13/07, Adrian Sutton [EMAIL PROTECTED] wrote: Hi all, The document's we're indexing are versioned and

Re: Embedded about 50% faster for indexing

2007-08-27 Thread climbingrose
using persistent http connections? Are you threadedly indexing? cheers, -Mike Paul Sundling -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 12:22 AM To: solr-user@lucene.apache.org Subject: Re: Embedded about 50

Re: Spell Check Handler

2007-08-17 Thread climbingrose
Thanks Karl. I'll check it out! On 8/18/07, karl wettin [EMAIL PROTECTED] wrote: I updated LUCENE-626 last night. It should now run smooth without LUCENE-550, but smoother with. Perhaps it is something you can use. 12 aug 2007 kl. 14.24 skrev climbingrose: I'm happy to contribute code

Re: Spell Check Handler

2007-08-11 Thread climbingrose
://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel . On 8/11/07, Pieter Berkel [EMAIL PROTECTED] wrote: On 11/08/07, climbingrose [EMAIL PROTECTED] wrote: The spellchecker handler doesn't seem to work with multi-word query. For example, when I

Re: FunctionQuery and boosting documents using date arithmetic

2007-08-11 Thread climbingrose
I'm having the date boosting function as well. I'm using this function: F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around 10,000 of documents added in one day, rord(createDate) returns very different values for the same createDate. For example, the last document added with

Re: Spell Check Handler

2007-08-11 Thread climbingrose
Yeah. How stable is the patch Karl? Is it possible to use it in product environment? On 8/12/07, karl wettin [EMAIL PROTECTED] wrote: 11 aug 2007 kl. 10.36 skrev climbingrose: There is an issue on Lucene issue tracker regarding multi-word spellchecker: https://issues.apache.org/jira

Re: Spell Check Handler

2007-08-10 Thread climbingrose
know if this is still not clear, I probably will add it to the wiki page soon. cheers, Tristan On 7/9/07, climbingrose [EMAIL PROTECTED] wrote: Thanks for the quick reply. However, I'm still not able to setup spellchecker. Solr does create spell directory under data

Re: Spell Check Handler

2007-08-10 Thread climbingrose
irrelevant suggestions for the location part since the number of terms in location is generally much smaller compared with that of description. Any ideas? Thanks. On 8/11/07, climbingrose [EMAIL PROTECTED] wrote: The spellchecker handler doesn't seem to work with multi-word query. For example

Re: Spell Check Handler

2007-08-10 Thread climbingrose
OK, I just need to define 2 spellcheckers in solrconfig.xml for my purpose. On 8/11/07, climbingrose [EMAIL PROTECTED] wrote: After looking the SpellChecker code, I realised that it only supports single-word. I made a very naive modification of SpellCheckerHandler to get multi-word support

Date rounding up

2007-08-08 Thread climbingrose
Hi all, I think there might be something wrong with the date time rounding up. I tried this query: q=*:*fq=listedDate:[NOW/DAY-1DAY TO *] which I think should return results since yesterday. So if today is 9th of August, it should return all results from the 8th of August. However, Solr returns

Re: mandatory and optional fields in the dismaxrequesthandler

2007-07-30 Thread climbingrose
I think I have the same question as Arnaud. For example, my dismax query has qf=title^5 description^2. Now if I search for Java developer, I want to make sure that the results have at least java or developer in the title. Is this possible with dismax query? On 7/30/07, Chris Hostetter [EMAIL

DisMax query and date boosting

2007-07-19 Thread climbingrose
Hi all, I'm puzzling over how to boost a date field in a DisMax query. Atm, my qf is title^5 summary^1. However, what I really want to do is to allow document with latest listedDate to have better score. For example, documents with listedDate:[NOW-1DAY TO *] have additional score over documents

Re: DisMax query and date boosting

2007-07-19 Thread climbingrose
] wrote: I think in this case you can use a bq (Boost Query) so you can apply this boost to the range you want. str name=bqyour_date_field:[NOW/DAY-24HOURS TO NOW]^10.0/str This example will boost your documents with date within the last 24h. Regards, Daniel On 19/7/07 14:45, climbingrose

Re: DisMax query and date boosting

2007-07-19 Thread climbingrose
Just tried the bq approach and it works beautifully. Exactly what I was looking for. Still, I'd like to know which approach is the preferred? Thanks again guys. On 7/20/07, climbingrose [EMAIL PROTECTED] wrote: Thanks for both answers. Which one is better in terms of performance? bq or bf

Re: DisMax query and date boosting

2007-07-19 Thread climbingrose
Thanks for the answer Chris. The DisMax query handler is just amazing! On 7/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : Just tried the bq approach and it works beautifully. Exactly what I was : looking for. Still, I'd like to know which approach is the preferred? Thanks : again guys. i

Slow facet with custom Analyser

2007-07-16 Thread climbingrose
Hi all, My facet browsing performance has been decent on my system until I add my custom Analyser. Initially, I facetted title field which is of default string type (no analysers, tokenisers...) and got quick responses (first query is just under 1s, subsequent queries are 0.1s). I created a

Re: Slow facet with custom Analyser

2007-07-16 Thread climbingrose
to 100 if you have the memory Optimizing your index should also speed up faceting (but that is a lot of facets). -Yonik On 7/16/07, climbingrose [EMAIL PROTECTED] wrote: Hi all, My facet browsing performance has been decent on my system until I add my custom Analyser. Initially, I facetted

Re: Slow facet with custom Analyser

2007-07-16 Thread climbingrose
, climbingrose [EMAIL PROTECTED] wrote: Thanks Yonik. In my case, there is only one title field per document so is there a way to force Solr to work the old way? My analyser doesn't break up the title field into multiple tokens. It only tries to format the field value (to lower case, remove unwanted

Re: Slow facet with custom Analyser

2007-07-16 Thread climbingrose
Thanks for the suggestion Chris. I modified SimpleFacets to check for [f.foo.]facet.field.type==(single|multi) and the performance has been improved significantly. On 7/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : ...but i don't understand why both checking isTokenized() ... shouldn't :

A few questions regarding multi-word synonyms and parameters encoding

2007-07-10 Thread climbingrose
Hi all, I've been using Solr for the last few projects and the experience has been great. I'll post the link to the website once it finishes. Just have a few questions regarding synonyms and parameters encoding: 1) Is multi-word synonyms possible now in Solr? For example, can I have things like

Re: history

2007-07-08 Thread climbingrose
Accidentally I have a very similar use case. Thanks for advice. On 7/8/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 7/7/07, Brian Whitman [EMAIL PROTECTED] wrote: I have been trying to plan out a history function for Solr. When I update a document with an existing unique key, I would like

Re: Dynamic fields performance question

2007-03-26 Thread climbingrose
Thanks Yonik. I think both of the conditions hold true for our application ;). On 3/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 3/26/07, climbingrose [EMAIL PROTECTED] wrote: I'm developing an application that potentially creates thousands of dynamic fields. Does anyone know if large

Dynamic fields performance question

2007-03-25 Thread climbingrose
Hi all, I'm developing an application that potentially creates thousands of dynamic fields. Does anyone know if large number of dynamic fields will degrade Solr performance? Thanks. -- Regards, Cuong Hoang

Solr use case

2006-10-11 Thread climbingrose
Hi all, Is it true that Solr is mainly used for applications that rarely change the underlying data? As I understand, if you submit new data or modify existing data on Solr server, you would have to refresh the cache somehow to display the updated data. If my application frequently gets new

Multiple schemas

2006-09-26 Thread climbingrose
Hi all, Am I right that we can only have one schema per solr server? If so, how would you deal with the issue of submitting completely different data models (such as clothes and cars)? Thanks. -- Regards, Cuong Hoang

Re: Mobile phone shop + Solr

2006-09-13 Thread climbingrose
I probably need to visualise my models: MobileInfo (1)(1...*) SellingItem MobileInfo has many fields to describe the characteristics of a mobile phone model (color, size..). SellingItem is an instance of MobileInfo that is currently sold by a user. So in the