Re: Bet you didn't know Lucene can...

2011-10-23 Thread Dawid Weiss
Hi Grant, In Carrot2 (and Carrot Search's commercial products) we're not using Lucene as an indexing/ search service directly, but we are re-using a lot of internal infrastructure (like analyzers, ported snowball stemmers and other segmentation stuff). We also plan on using the new language

reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
I already have the term-frequency-count for all the terms in a document. Is there a way I can re-use that info while indexing. I would like to use solr for this. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

Filter and query precedence, boolean query

2011-10-23 Thread Josh Devins
Hi folks, I'm hoping someone can shed some light on how filters and boolean queries work under the hood. As I understand it, the following two queries are functionally equivalent: boolean must, term query: foo, boolean must, term query: bar term query: foo, term filter: bar What I'd like to

Re: reusing the term-frequency count while indexing

2011-10-23 Thread ppp c
Of curse, it can be reused. But from my point of view, it's meaningless, since the analysis process has to be performed to collect such as prox, offset, or syno, payload and so on. On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee prasen@gmail.comwrote: I already have the

Re: performance question - number of documents

2011-10-23 Thread Erick Erickson
Why would it matter...top 5 matches Because Lucene has to calculate the score of all documents in order to insure that it returns those 5 documents. What if the very last document scored was the most relevant? Best Erick On Sun, Oct 23, 2011 at 3:06 PM, sol myr solmy...@yahoo.com wrote: Hi,

Re: Filter and query precedence, boolean query

2011-10-23 Thread Simon Willnauer
hey josh, On Sun, Oct 23, 2011 at 5:39 PM, Josh Devins j...@amenhq.com wrote: Hi folks, I'm hoping someone can shed some light on how filters and boolean queries work under the hood. As I understand it, the following two queries are functionally equivalent: boolean must, term query: foo,

Re: Filter and query precedence, boolean query

2011-10-23 Thread Josh Devins
I'll reply to the thread with your comment from our IM chat in case it helps anyone else thinking about this. In response to what is preferred, boolean query w/ term queries or a term filter+term query and if order in the boolean query somehow matters: we take care of this internaly no matter

Re: Using Lucene to index Wikipedia

2011-10-23 Thread Michael Sokolov
Daniel, since no one knowledgeable has answered I'll take a stab - there are a number of ant targets you can run, most of which incorporate some indexing step(s). Basically you can run: ant -Dtask.alg=alg file it looks as if the ant build.xml is set up to run conf/micro-standard.alg by

Re: reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
Can you tell me how I can feed the lucene index by using the term frequency directly ? Actually I am getting the documents along with their term-frequency and don't want to write any additional code to expand them. On 10/23/11, ppp c peter.c.e...@gmail.com wrote: Of curse, it can be reused.

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-23 Thread prasenjit mukherjee
Any pointers/suggestions on my approach ? On 10/22/11, prasenjit mukherjee prasen@gmail.com wrote: My use case is the following : Given an n-dimensional vector ( only +ve quadrants/points ) find its closest neighbours. I would like to try out with lucene's default ranking. Here is how a