Thanks a lot for your help. I am now using query as documented for phrase.
Regards,
Ruchi
On 3/13/07, Chris Hostetter [EMAIL PROTECTED] wrote:
: ok, so does that mean i can use both q1 and q2 for phrase query ie; for
: searching words adjacent to each other. Actually that was my only
concern,
Hi Xiong,
You're ranking idea sounds interesting ... are you looking into
something akin to the TrafficRank algorithm ? This is moving into the
realm of Personalized search or Personalised search, something I'm
not aware of appearing on the Lucene mailing lists so far, but something
I'm quite
daniel rosher daniel.rosher at hotonline.com writes:
We regularly open a new IndexReader, and before this reader replaces the
production one, we determine f(D) for all documents so that for the user
there is almost no performance issue,i.e. f(D) is cached. I suspect you
can implement
I'm using Lucene for indexing my nutch crawls. But I don't really understand
the difference for this flag Field.Store.YES or NO. It seems (using luke) I
still can read some data who were not 'store.YES'. Where are store this data
if it's not in the index? what is better to use for small fields?
Hi all,
I am using StopAnalzer for indexing and searching. Am searching for phrases.
q1 - a b
this query gives me all documents conatining a b , but also gives
documents conatining a b
again q2 - a b
this query q2 gives documents conatining a b, but also gives
documents conatining a b
How
Hi,
I am new to Lucene Java API.
I want to use StandardAnalyzer for tokenizing my document. How can I
use it?
Further how can I index Acronym and Company name as one term.
I know , we can do this using StandardAnalyzer but I am not sure of the
way.
Thanks in advance
Sandeep
You can read demo source code from lucene source package.
2007/3/16, [EMAIL PROTECTED] [EMAIL PROTECTED]:
Hi,
I am new to Lucene Java API.
I want to use StandardAnalyzer for tokenizing my document. How can I
use it?
Further how can I index Acronym and Company name as one term.
I
This confused me at first too, so here's my current understanding...
When you use YES, you store the actual data as-is with the document.
This is entirely independent of indexing. Internally, I assume that
searching and storing are separate parts of the index that
have nothing to do with each
What analyzers are you using at index and search time? I suspect
that the '' is being removed both at index and search. So, you've
only indexed the tokens 'a' and 'b' and by the time you get out
of the query parser, you're only searching for terms 'a' 'b'.
Did you bother using query.toString()
Also, See SynonymAnalyzer in Lucene In Action.
Erick
On 3/16/07, James liu [EMAIL PROTECTED] wrote:
You can read demo source code from lucene source package.
2007/3/16, [EMAIL PROTECTED] [EMAIL PROTECTED]:
Hi,
I am new to Lucene Java API.
I want to use StandardAnalyzer for
Thanks Eric. I will try out the suggestions. I am using StopAnalyzer.
Regards,
Ruchi
On 3/16/07, Erick Erickson [EMAIL PROTECTED] wrote:
What analyzers are you using at index and search time? I suspect
that the '' is being removed both at index and search. So, you've
only indexed the tokens
For search phrases there's no need to detect the phrases at indexing time
- the position of each word is saved in the index and then used at search
time to match phrase queries. (also see 'query syntax document'.)
Lucene takes plain text as document input - extraction of content text and
Hi Peter,
Shouldn't the search perform the euclidean distance during filtering as well
though, otherwise you will obtain perhaps highly relevant hits reported to
the user outside the range they specified? Particularly as the search radius
gets larger.
Cheers,
Dan
On 1/28/07, Peter Keegan
Hi Matt,
To verify I understand correctly, are this your settings? :
- one MAIN index containing all the data:
used for search;
never does addDocument();
- Several side INC indexes:
addDocument() here for new/modified documents;
never searched;
- at some point all INC indexes are merged
: Sounds like there's nothing out of the box to solve my problem; if
: I write something to update lucene indexes in place I'll follow up
: about it in here (don't know that I will though; building a new,
: narrower index is probably more expedient and will probably be fast
: enough for my
: Nutch recently added a search query timeout (NUTCH-308). Are there any
: plans to add such functionality to the Lucene HitCollector directly? Or
: is there some reason that this is a bad idea?
Quickly skimming the patch in that Issue, Nutch seems to have done what
has been discussed
Dan,
The filtering is done in the HitCollector by the bounding box, so the only
hits that get collected are those that match the keywords, the bounding box,
and some Lucene filters (BitSets) (I'm probably overloading the word
'filter' a bit). So, the only hits from the collector that need to be
Yet another idea just occurred. Remember that documents in
Lucene do not all have to have the same field. So what if you had
a *very special document* in your index that contained only the
changing info? Perhaps in XML or even binary format? Then, updating
your index would only involve deleting
Note: this is a reply to a posting to java-dev --Peter
Eric,
Now that it is live, is performance pretty good?
Performance is outstanding. Each server can easily handle well over 100 qps
on an index of over 800K documents. There are several servers (4 dual core
(8 CPU) Opteron) supporting
Hi,
I am trying to index the content from XML files which are basically the
metadata collected from a website which have a huge collection of documents.
This metadata xml has control characters which causes errors while trying to
parse using the DOM parser. I tried to use encoding = UTF-8 but
20 matches
Mail list logo