Mark Miller wrote:
Hmmm - you can probably get qsol to do it:
http://myhardshadow.com/qsol. I think you can setup any token to
expand to anything with a regex matcher and use group capturing in the
replacement (I don't fully remember though, been a while since I've
used it).
So you could do
Thanks for all the answers.
I am new to Lucene and in the emails its the first time I heard of the
bigrams and thus read about them a bit.
Question - if I query for cat animal - or use boosting - cat^2
animal^0.5 - will the results return ONLY documents that contain both?
From what I saw until
Thank you.
I suppose the solution for this is to not create an index but to store
co-occurence frequencies at Analyzer level.
Adrian.
On Mon, Mar 16, 2009 at 11:37 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Be careful: docFreq does not take deletions into account.
Hi,
I edited Luke's code so it also uses my classes (I added the jar to the
class-path and put it in the lib folder).
When I run from java it works good.
Now I try to build it and invoke Luke's jar outside java and get the
following error:
Exception in thread main java.lang.NoClassDefFoundError:
Well, assuming that when you say invoke Luke's jar outside java you
mean that you are trying to run Luke from the command line e.g. $ java
-jar lukexxx.jar, it simply sounds like your classes are not on the
classpath. Add them.
--
Ian.
On Tue, Mar 17, 2009 at 10:20 AM, liat oren
Hi Ian,
Thanks for the answer.
Yes, I meant running in from command line.
They are already in the classpath - I added this part:
classpathentry kind=lib path=lib/myJar.jar/
2009/3/17 Ian Lea ian@gmail.com
Well, assuming that when you say invoke Luke's jar outside java you
mean that you
Adrian Dimulescu wrote:
Thank you.
I suppose the solution for this is to not create an index but to store
co-occurence frequencies at Analyzer level.
I don't understand how this would address the docFreq does
not reflect deletions.
You can use the shingles analyzer (under
This is all getting very complicated!
Adrian - have you looked any further into why your original two term
query was too slow? My experience is that simple queries are usually
extremely fast. Standard questions: have you warmed up the searcher?
How large is the index? How many occurrences of
Michael McCandless wrote:
I don't understand how this would address the docFreq does
not reflect deletions.
Bad mail-quoting, sorry. I am not interested by document deletion, I
just index Wikipedia once, and want to get a co-occurrence-based
similarity distance between words called NGD
org.apache.lucene.analysis.PerFieldAnalyzerWrapper
There's plenty of info about it on the web, even some recent
discussion on this list which will be in the archives.
--
Ian.
On Tue, Mar 17, 2009 at 11:17 AM, Raymond Balmès
raymond.bal...@gmail.com wrote:
I was looking for calling a different
Ian Lea wrote:
Adrian - have you looked any further into why your original two term
query was too slow? My experience is that simple queries are usually
extremely fast.
Let me first point out that it is not too slow in absolute terms, it
is only for my particular needs of attempting the
OK thank's a lot, I must be very poor about searching ;-)... I kind of
missed these information.
Thx again.
-Ray-
On Tue, Mar 17, 2009 at 12:25 PM, Uwe Schindler u...@thetaphi.de wrote:
It is possible in two ways:
1. Use the analyzer class and generate a TokenStream/Tokenizer from it.
Then
I was looking for calling a different analyzer for each field of a
document... looks like it is not possible.
Do I have it right ?
-Ray-
Added that classpathentry to what? That means nothing to me.
I'd run it from the command line as
$ java -cp whatever -jar whatever.jar
or
$ export CLASSPATH=whatever
$ java -jar whatever.jar
Those examples are unix based. If you're on Windows I imagine there
are equivalents. Or maybe your
I work on windows.
I copied my jar to the lib directory - so it is now together with the other
jars Luke uses (Lucene, etc)
And added the text below to the classpath file (exists in the luke-src-0.9.1
directory).
2009/3/17 Ian Lea ian@gmail.com
Added that classpathentry to what? That
OK - thanks for the explanation. So this is not just a simple search ...
I'll go away and leave you and Michael and the other experts to talk
about clever solutions.
--
Ian.
On Tue, Mar 17, 2009 at 11:35 AM, Adrian Dimulescu
adrian.dimule...@gmail.com wrote:
Ian Lea wrote:
Adrian - have
Is this a one-time computation? If so, couldn't you wait a long time
for the machine to simply finish it?
With the simple approach (doing 100 million 2-term AND queries), how
long do you estimate it'd take?
I think you could do this with your own analyzer (as you
suggested)... it would run
Hello.
Can I get access to the terms of a field and its frequency during indexing the
document?
Thanks.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
Hello Luceners,
what is the official pom.xml fragment to be used for the contribs
package of lucene?
It seems to be only of type pom inside the maven repository... does it
mean that I have to fetch sub-contribs ?
paul
smime.p7s
Description: S/MIME cryptographic signature
You might try looking in a list that talks about recommender systems.
Google hits:
- http://en.wikipedia.org/wiki/Recommendation_system
- ACM Recommender Systems 2009 http://recsys.acm.org/
- A Guide to Recommender Systems
http://www.readwriteweb.com/archives/recommender_systems.php
2009/3/17
I'm not sure this would fall primarily under recommenders... I would assume
Facebook is doing look-ahead on connections. i.e. A-B, B-C, so suggest
A-C. Then they weight the suggestions by the number of indirect links between
A and C and probably other factors (which is where the generic
Have a look at the Lucene sister project: Mahout: http://lucene.apache.org/mahout
. In there is the Taste collaborative filtering project which is all
about recommendations.
On Mar 17, 2009, at 9:32 AM, Aaron Schon wrote:
Hi all, Apologies if this question is off-topic, but I was
Hi Paul,
On 3/17/2009 at 9:18 AM, Paul Libbrecht wrote:
what is the official pom.xml fragment to be used for the contribs
package of lucene?
It seems to be only of type pom inside the maven repository... does it
mean that I have to fetch sub-contribs ?
Your POM should include dependencies
I am using lucene to index rows in a spreadsheet , each row is a
Document, and the document indexes 10 fields from the row plus the row
number used to relate thethe Document to the row number
So when someone modifies one of the 10 fields I am interested in a row I
have to update the document
Hi Paul,
If you do not store all the data inside lucene you have to get you
updated data from you spreadsheet again. Even if you would store all
the data you would have to update the document by creating a new one
and adding it to the index using updateDocument(). You can not update
just one
Hello
I was trying to use the DuplicateFilter api in contrib/queries for Lucene in
an
application but it doesn't seem to be accepted as a valid argument to the
searcher.search function. I'm using Apache Lucene 2.4.0.
Here's what I did.
DuplicateFilter df=new DuplicateFilter(NAME);
Michael McCandless wrote:
Is this a one-time computation? If so, couldn't you wait a long time
for the machine to simply finish it?
The final production computation is one-time, still, I have to
recurrently come back and correct some errors, then retry...
With the simple approach (doing 100
You may want to try Filters (starting from TermFilter) for this, especially
those based on the default OpenBitSet (see the intersection count method)
because of your interest in stop words.
10k OpenBitSets for 39 M docs will probably not fit in memory in one go,
but that can be worked around by
On Mar 17, 2009, at 2:32 PM, Aaron Schon wrote:
how would I go about recommending Jane Doe connecting to Frank
Jones?. Hope you can help a newbie by pointing where I should be
looking?
You might as well read something about it to get you started:
Programming Collective Intelligence
I've recently upgraded to Solr 1.3 using Lucene 2.4. One of the reasons I
upgraded was because of the nicer SearchComponent architecture that let me
add a needed feature to the default request handler. Simply put, I needed to
filter a query based on some additional parameters. So I subclassed
: I suppose SpanTermQuery could override the weight/scorer methods so that
: it behaved more like a TermQuery if it was executed directly ... but
: that's really not what it's intended for.
:
: This is currently the only way to boost a term via payloads.
: BoostingTermQuery extends
: The final production computation is one-time, still, I have to recurrently
: come back and correct some errors, then retry...
this doesn't really seem like a problem ideally suited for Lucene ... this
seems like the type of problem sequential batch crunching could solve
better...
first
On Mar 17, 2009, at 5:44 AM, liat oren wrote:
Thanks for all the answers.
I am new to Lucene and in the emails its the first time I heard of the
bigrams and thus read about them a bit.
Question - if I query for cat animal - or use boosting - cat^2
animal^0.5 - will the results return ONLY
33 matches
Mail list logo