g'day,
i've two questions.
let's say the following is my index with 2 field : title and contents
title contents
beer beer is good
beer beer is good
catsleepy
dog what a cute one!
beer
: Is it possible to perform a search using fields instead of terms, eg.
: like this sql:
: SELECT col1, col2
: FROM table1
: WHERE col1 = col2
presumably col1 and col2 are untokenized fields? (otherwise equality
is kind of vague)
if you really wanted to add a constraint like this to an existing
: 2. Recreating the index from scratch will require the moving of the
: heavens and the earth.
:
: My crazy idea - can we add new Documents to the index with the Fields
: we wish to add, and duplicate file IDs? i.e. an entry for file ID Foo
: would consist of two Documents,
: Document X:
take a look at the HitCollector and Filter APIs .. you can impliment any
logic you want in either of those classes to restrict what results you get
-- and the FieldCache gives you an easy way to check what the value of a
particular indexed field is.
storing the mappings of field value to best
:! If a document does not contain a queryterm this score can be larger
: or smaller than 0 !
if a document doesn't contain a term, then the scorer for that query will
never even try to score that document -- regardless of what your
Similarity class looks like.
if you really want this kind
On Jun 9, 2006, at 2:10 AM, Chris Hostetter wrote:
: 2. Recreating the index from scratch will require the moving of the
: heavens and the earth.
:
: My crazy idea - can we add new Documents to the index with the
Fields
: we wish to add, and duplicate file IDs? i.e. an entry for file ID
He he, nice comparison!
Cheers for the advice.
Rob.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: 09 June 2006 08:00
To: java-user@lucene.apache.org
Subject: RE: Property comparison possible??
: Is it possible to perform a search using fields instead of
I am no longer a Jira virgin.
http://issues.apache.org/jira/browse/LUCENE-594 Thanks again.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: 09 June 2006 07:13
To: java-user@lucene.apache.org
Subject: RE: Compound / non-compound index files and SIGKILL
: Whom
Hi Bob,
No idea if this would work BUT...
If the old index is optimized then you might be able to iterate through
all the docs in your old index (sorted by doc id) and for each iteration
add the corresponding doc to the new index so it has a matching doc id.
The idea being that after searching
My lucene version is 1.4.3 and always worked with this. Someday I have to do
the change to Lucene 2.0. But the problem isn't this because the problem is
something like One index have something indexed and other index is olnly
created but without any document.
It's very strange because this
:! If a document does not contain a queryterm this score
can be larger
: or smaller than 0 !
if a document doesn't contain a term, then the scorer for
that query will never even try to score that document --
regardless of what your Similarity class looks like.
if you really want
Hi Patricio,
As of now, I don't think this is possible. However, we are slowly but
surely working on similar problems. Please feel free to add your two
cents to http://wiki.apache.org/jakarta-lucene/FlexibleIndexing as we
are considering several new ideas related to making indexing more
hey,
i am using the pmsearcher to retrieve data from a number of ram indexes. i
am calling my own search function which calls the indexsearcher.search meathod
and returns the top 100 ids/scores , however, before returning the topdocs i
start a separate thread which requeries the index
Hi all,
my index size has grown too much and I keep getting outOfMemoryError after
running few searches. I am using all the RAM that the JVM is allowing me 2.6GB.
I am left with two solutions now, the easy and expensive solution is to upgrade
the hardware to a 64-bit System and use more RAM.
Hi All,
Has anyone else out there come across the shortcomings of the new
COMMIT_LOCK_TIMEOUT in regards to
searching on an actively updated Index?
It used to be a settable system property and therefor semi dynamic
across a system with multiple readers/searchers and
one writer. I am aware
I have an integer field that I've indexed after converting to a string
using NumberTools.longToString().
Now I want to sort my results using this field. Everything works when
treating the field as a string, but is very slow and memory intensive.
I want to use INT sorting instead, but these
Hi All! I have a trouble... When I index text documents in english, there
is no problem, buy when I index Spanish text documents (And they're big),
a lot of information form the document don't become Indexed (I suppose it
is due to the Analyzer). Howewer I want to Index ALL the strings in the
I compared Solr's DocSetHitCollector and counting bitset intersections to
get facet counts with a different approach that uses a custom hit collector
that tests each docid hit (bit) with each facets' bitset and increments a
count in a histogram. My assumption was that for queries with few hits,
If the old index is optimized then you might be able to iterate
through
all the docs in your old index (sorted by doc id) and for each
iteration
add the corresponding doc to the new index so it has a matching doc
id.
The idea being that after searching on one index you could use the doc
: : would consist of two Documents,
: : Document X: fileID:Foo, contents:unknown
: : Document Y:fileID:Foo, title:Bar, url:www.baz.com, etc.
: add another document with the same fileID and a title field and a url
: field, and you search for contents:germany you're still going to get
: back
: For example: a query containing two terms: fast, car, having
: document frequencies 300.000 and 20.000 in the index respectively. In a
: worst case scenario this would require 320.000 document scores to be
: calculated. I am not really sure how lucene optimizes its search, but I
: guess it
: That kinda would be the point - contents:germany would get the
same
: fileIDs, but contents:germany title:medicine would (hopefully)
give
: us a more specific query.
when you say contents:germany title:medicine i'm not sure if you are
assuming that both clauses are mandatory or optional
Hi All! I have a trouble... When I index text documents in english, there
is no problem, buy when I index Spanish text documents (And they're big),
a lot of information from the document don't become indexed (I suppose it
is due to the Analyzer, but if the documents is less tahn 400kb it
works
On Freitag 09 Juni 2006 21:31, manu mohedano wrote:
Hi All! I have a trouble... When I index text documents in english,
there is no problem, buy when I index Spanish text documents (And
they're big), a lot of information from the document don't become
indexed
Read the FAQ at
Hi,
From: manu mohedano [mailto:[EMAIL PROTECTED]
Hi All! I have a trouble... When I index text documents in
english, there is no problem, buy when I index Spanish text
documents (And they're big), a lot of information from the
document don't become indexed (I suppose it is due to the
: fileID twice .. if you mean you want the list of fileIDs that match
: both
: clauses, you're not going to get any results back -- because no doc
: with a
: contents field is going to have a title field, and no doc with a title
: field is going to have a contents field.
: I'd want both
: I have an integer field that I've indexed after converting to a string
: using NumberTools.longToString().
: Now I want to sort my results using this field. Everything works when
: treating the field as a string, but is very slow and memory intensive.
:
: I want to use INT sorting instead, but
On 6/8/06, Bob Arens [EMAIL PROTECTED] wrote:
I've been handed a legacy index containing Documents with two Fields;
one is a file ID, the other is contents of the file. The contents
field was added using UnStored. Now, we want to add fields. Oh, the
humanity!
My crazy idea - can we add new
Couple of things.
1 you can use a different analyzer to NOT remove stopwords. SimpleAnalyzer
comes to mind (though watch out for case). Look at LuceneInAction for an
explanation of several analyzers that are available.
2 If memory servers, Lucene defaults to indexing only the first 10,000
words
Problem Solved! Thank's a lot guys!!!
30 matches
Mail list logo