Hi,
I have a requirement where I have a list of Suppliers(documents for lucene
index) and a list of Products(documents again). Each Product has a supplier.
e.g. :
Product - RouterX, Supplier - DLink, Netgear
Product - RouterY, Supplier - Cisco
If I search for Cisco, RouterY should show up.
Sir,
Am using lucene 2.3.2. I would like to know what are the fields that are been
indexed?
Ex:
doc.get(path);
this statement returns the path of the document
like path what are the other fields of the document used by lucene
I went through converting all the class files to java
Lucene will index and store the fields that you tell it to when a
document is written to the index.
In lucene 2.4 doc.getFields() returns a List of all the fields in a
document and probably in 2.3.2 as well. See the javadoc. That will
tell you the fields that have been stored but I think not
Lucene is not a database. You'll need to flatten the data and yes,
that does mean duplication.
--
Ian.
On Mon, Nov 23, 2009 at 9:05 AM, sameerpatil nabblegm...@gmail.com wrote:
Hi,
I have a requirement where I have a list of Suppliers(documents for lucene
index) and a list of
Use this tool to examine the index: http://www.getopt.org/luke/
I would also suggest getting hold of a Lucene book such as Lucene In Action
(http://www.manning.com/hatcher2/) to get familiar with the basics of
Lucene.
On Mon, Nov 23, 2009 at 4:42 AM, DHIVYA M dhivyakrishna...@yahoo.comwrote:
That was a good solution to my problem and i found my fields for the document.
Acutally i was trying it to find out how to implement autosuggest with lucene.
Can you suggest me an idea of how to use autosuggest wih lucene.
Thanks in advance,
Dhivya
--- On Mon, 23/11/09, Ian Lea
After commenting out the collector logic, the time is still more or less the
same.
Anyway, since without the filter collecting the documents is very fast it's
probably something with the filter itself.
I don't know how the filter (or boolean query) work internally but probably
for 10K or 50K
That was a good solution to my problem and i found my fields for the document.
Good.
Acutally i was trying it to find out how to implement autosuggest with lucene.
Can you suggest me an idea of how to use autosuggest wih lucene.
There was something about it recently on this list. Take a
By autosuggest, would you mean similar documents?
In that case you could try the lucene 'morelikethis' class.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Mon, Nov 23,
Sir,
I actually meant auto suggest as such available for google suggest similar to
autocomplete.
Where, users need not type the entire text and instead can go with the
suggestions available.
Thanks in advance,
Dhivya
--- On Mon, 23/11/09, Anshum ansh...@gmail.com wrote:
From: Anshum
Hi Joel,
I encounter the same problem.
Could you please elaborate a bit on this?
Many thanks,
Liat
2009/11/2 Joel Halbert j...@su3analytics.com
I opted to use the following query to solve this problem, since it meets
my requirements, for the time being.
+(cheese sandwich) cheese
Now I'm really confused, which usually means I'm making some
assumptions that aren't true. So here they are...
1 You're talking about Filters that contain BitSets, right? Not some other
kind of filter.
2 When you create your 10-50K filters, you wind up with a single filter
by combining
There are some tricks you can apply, but they amount to
keeping your own lists and manipulating them manually. As Ian
says, Lucene isn't a database, and if you find yourself spending
much time trying to *make* it behave like a database you should
probably re-think your approach.
But in this case,
Erick,
Maybe I didn't make myself clear enough.
I'm talking about high level filters used when searching.
I construct a very big BooleanQuery and add 50K clauses to it (I removed the
limit on max clauses).
Each clause is a TermQuery on the same field.
I don't know the internal doc ids that I
Oh my goodness yes. No wonder nothing I suggested made any
difference G. Ignore everything I've written
OK, here's something to try, and it goes back to a Filter. Rather than
make this enormous bunch of ORs, try creating a Filter. Use TermDocs
to run through your list of IDs assembling a
Tested it out. It doesn't work. A slop of zero indicates no words between
the provided terms. E.g. my query of plan _n returns entries like
contingency plan.
My work around for this problem is to use a PhraseQuery, where you can
explicitly set Terms to occur at the same location, t orecover
I've taken TermsFilter from contrib which does exactly that and indeed the
speed was reduced to half, which starts to be reasonable for my needs.
I've researched the regular QueryFilter and what I write here might not be
the complete picture:
I found out that most of the time is spent on scoring
A slop of -1 doesn't work either. I get no results returned.
this would be a *really* helpful feature for me if someone might suggest an
implementation as I would really like to be able to do arbitrary span
searches where tokens may be at the same position and also in other
positions where the
Op maandag 23 november 2009 17:27:56 schreef Christopher Tignor:
A slop of -1 doesn't work either. I get no results returned.
I think the problem is in the NearSpansOrdered.docSpansOrdered methods.
Could you replace the by = in there (4 times) and try again?
That will allow spans at the same
Your trying -1 with ordered right? Try it with non ordered.
Christopher Tignor wrote:
A slop of -1 doesn't work either. I get no results returned.
this would be a *really* helpful feature for me if someone might suggest an
implementation as I would really like to be able to do arbitrary span
For auto complete, you could try the following:
1. Run a prefix query. [Could be a fuzzy query]
2. Index using something like ngrams.
term : sample is indexed as 4 terms, viz:
t
te
ter
term
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody,
See: http://issues.apache.org/jira/browse/LUCENE-1427
http://issues.apache.org/jira/browse/LUCENE-1427Short form: this is fixed,
but not until 2.9.
If you don't want to upgrade, you could always leave the Filter off
your initial query and have your Collector insure that any docs
were in the
If you just want to autocomplete the current term the user enters,
initialize a TermEnum with the current entered term fragment. If you then
iterate through the termenum, you get all terms that exist in the index
*after* that term (in unicode codepoint order). Stop iterating, when the
term does
Thanks so much for this.
Using an un-ordered query, the -1 slop indeed returns the correct results,
matching tokens at the same position.
I tried the same query but ordered both after and before rebuilding the
source with Paul's changes to NearSpansOrdered but the query was still
failing,
Thank you, Mike, for explanation.
So I understand that all the data is kept even if any of these
merging threads fail. Will Lucene keep attempting merge every
time addDocument is called afterwards once this happened
(and the error is persistent - such as filesystem full)?
Will
IndexWriter will try the merge again, the next time it checks merges
(eg after flushing a new segment, but not after adding a new
document).
You'll only get an exception out of addDocument/commit/flush if they
hit the problem, eg, if on flushing a new segment it runs out of
space.
But often
Also, I noticed that with the above edit to NearSpansOrdered I am getting
erroneous results fo normal ordered searches using searches like:
_n followed by work
where because _n and work are at the same position the code changes
accept their pairing as a valid in-order result now that the eqaul
Thanks guys, I get the point, it is best to reindex(hope it isnt very
expensive). And yes, it's true that the suppliers dont change often. I
--
View this message in context:
http://old.nabble.com/Linking-Fields-to-Documents-possible--tp26474610p26485036.html
Sent from the Lucene - Java Users
Hi, I am using lucene 2.9.1 to index a continuous flow of events. My server
keeps an index writer open at all time and write events as groups of a few
hundred followed by a commit. While writing, users invoke my server to
perform searches. Once a day I optimize the index, while writes happens and
When you say getting a reader of the writer do you mean
writer.getReader()? Ie the new near real-time API in 2.9?
For that API (an in general whenever you open a reader), you must
close it. I think all your files is because you're not closing your
old readers.
Reopening readers during optimize
Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
Also, I noticed that with the above edit to NearSpansOrdered I am getting
erroneous results fo normal ordered searches using searches like:
_n followed by work
where because _n and work are at the same position the code
This was a really silly idea I had G. If your time is being spent in the
scoring
in the first place, keeping the Filter out of the query and checking against
it later in your Collector won't change the timing because you'll have done
all the scoring anyway. But I only thought about it on the way
We are going to add full-text search for our mailbox service .
The problem is we have more than 1 PB mails there , and obviously we
don't want to add another PB storage for search service , so we hope
the index data will be small enough for storage while the search keeps
fast .
The lucky is that
Hi, I have not worked on a petascale (yet!) - mostly on the scale of tens of
terabyes - but I do think Lucene would be very helpful for such usecase. I
would indeed suggest partitioning the index by users (seems the most
logical., straightforward way, also offers the security of insulating one
hello all
is there any way to update the spell index directory ? please any1 help
me out of this.
--
View this message in context:
http://old.nabble.com/updating-spell-index-tp26490695p26490695.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
A sharded architecture (i.e. smaller indexes) used by Google for
example and implemented by open source in the Katta project may be
best for scaling to sizable levels. Katta is also useful for
redundancy and fault tolerance.
On Mon, Nov 23, 2009 at 6:35 PM, fulin tang tangfu...@gmail.com wrote:
String[] suggestions = spellChecker.suggestSimilar(hoem, 3,indexReader,
contents, true);
this is how am retrieving my did you words
Grant Ingersoll-6 wrote:
How are you invoking the spell checker?
On Nov 19, 2009, at 1:22 AM, m.harig wrote:
hello all
i've a doubt in
1) correct: I am using IndexWriter.getReader(). I guess I was assuming that
was a privately owned object and I had no business dealing with its
lifecycle. the api would be clearer to rename the operation createReader().
2) how much transient disk space should I expect? isn't this pretty much
Hi all!
i've just started my adventure with Lucene i've got one question
regarding indexing.
Does Lucene have got built-in mechanism to store indexes first in RAM
and after some time or after some number of documents added to move them
to FS? And searching docs all the time in both
fulin tang wrote:
We are going to add full-text search for our mailbox service .
The problem is we have more than 1 PB mails there , and obviously we
don't want to add another PB storage for search service , so we hope
the index data will be small enough for storage while the search keeps
fast
Hi Rafal,
If what I understand about your implementation is correct, you could try a
parallelmultisearcher
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/ParallelMultiSearcher.html
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to
41 matches
Mail list logo