See below.

On 2/19/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:

Hi all,

I have a few question regarding indexing documents.

1. With my experience of indexing documents with lucene so far I have
done things like:

Doc.Add(Field.Text("album", Album));

Where Album is a string representing an album name.  Now with this sort
of indexing what I do is a search such as:

"album:Thriller"

a) Does this mean that I cannot do an search across all fields  by
submitting the query:

"Thriller"?  In other words by submitting this query would my code
search all fields?


No. If you just submit "Thriller", you'll only search the default field. See
QueryParser for the default field.


b) Is there a way in which I can index elements of a document without
naming the field.  What would the impact of such a use of the indexing
capabilities of Lucene be?


I don't think this makes sense in Lucene terms. All elements in a document
have a field. You can index everything into one field if you need an
aggregate, which gives you this same result.

Do note, however, that there's no requirement that all documents have the
same fields.


2. Is there a limit to the number of
a) named fields per document that I can store


I think there is, but it's absurdly high. Don't worry about this....


b) non-named fields per document that I can store


0 since I don't think you can.


3.

a) Is it possible in Lucene to conduct searches that are very complex
such as:

((album = Thriller AND artist = (Michael OR Jackson)) OR (date between X
AND Y)) AND (label = sony OR Epic)   etc...


Yes


b) For such a query what are the performance penalties compared to a
simple search involving 1 term?


In the immortal words of Mr. Hatcher.. .it depends. You'll really just have
to experiment and find out. It can probably be approximated by taking the
sum of the individual queries as the upper limit. The real killer is
wildcards..... The real question isn't "what is the effect on performance",
it's "is the performance good enough for my application". Which varies as
the characteristics of the database change.

I would argue that a 1M index will process arbitrarily complex queries "fast
enough". The same may not be true for a 100G index. So this question is
really unanswerable in the abstract.


Cheers

Sachin



This email and any attached files are confidential and copyright
protected. If you are not the addressee, any dissemination of this
communication is strictly prohibited. Unless otherwise expressly agreed in
writing, nothing stated in this communication shall be legally binding.

The ultimate parent company of the Atkins Group is WS Atkins
plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove,
Ashley Road, Epsom, Surrey KT18 5BW.

Consider the environment. Please don't print this e-mail unless you really
need to.

Reply via email to