My suggestion would be:
An "all" field that captures all your attributes and allows for generic, easy search across all products. Additionally, go ahead and index all your fields per documents. Then, for your default search, use the all field. _IF_ you know what category of products you are in (i.e. TVs) then you could search against those fields that you know are on TVs. This way, you have a set of fields per product type and you make sure that all instances of that product type have those fields.

There really isn't a need for separate indices in this case, I don't think. The tradeoff with the "all" approach is some of your stats may be skewed, but it probably isn't provable or noticeable for this kind of thing.


On Jan 3, 2008, at 1:18 PM, Dai, Chunhe wrote:

Thank all of your guys that made suggestions. I greatly appreciate them.

Our issue is that, our data have the notion of family, for example, a
Product family could contains products like TV, Car, DVD, etc. Of
course, each individual set of the product would have its own set of
definition - which contains the finite number of attributes that
describe each of the actual product like TV, or Car. For example TV
would have size, make, weight; Car might have year of made, number of
doors etc. and of course, all of them have SKU, price as common
attributes.

When we set up the index, I originally thought a good idea for setting
up the index would be on the definition - which means, I would set up
one index for TV, another one for Car, and a third one for DVD and so
on. When the idea was presented, people are asking whether it is
possible to put all the product in one indexes called Product and
whether it would cause any problem. They basically want to be able to
search for one common attribute in the index and bring back TV, Car, DVD
at the same time and that is the question got started and I needed to
find out whether this one index per family approach would be causing
trouble down the line.

Thanks again for your help.

-Chunhe

-----Original Message-----
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 03, 2008 1:03 PM
To: java-user@lucene.apache.org
Subject: Re: Suggested number of fields limit per Index

Another issues is how to generate queries.  If you have hundreds of
fields, you may have to generate queries (e.g. using the
MultfieldQueryParser) across all those fields just to find documents
that _could_ have those fields.  This can lead to the dreaded
TooManyClausesException.

That being said, Lucene can handle that many fields; I doubt, though,
that many would consider it a best practice and I don't think there
would be any indexing performance issues.  Number of fields can be a
search issue, but I don't know what your requirements are to say for
sure.

I would say that if you have alternative approaches that you think will work for your other requirements, and use less fields, then give that a
try.  I don't know if I would go so far as say all fields should be in
common, but that is a good thing to approach, as it makes things easier. Are you sure you can't just map your fields into a common set? Perhaps
if you described the problem a bit more, we can help.

-Grant



On Jan 3, 2008, at 11:45 AM, Dai, Chunhe wrote:

I have been searching online could not find an exact answer; and
wondering if anyone here knows whether there is a preferred max number

of fields limit in lucene index?

We are in the process of deciding how our index would look like in our

lucene integration. For one of our approach, we could have a large
number of fields in the index - say maybe several hundred. But, each
Document in the index do not contain every of those fields and would
only have a few fields within those hundreds of fields (Probably in
tens). Does anyone ever have experience with set up like this? I am
wondering whether there is a potential performance issue with indexing

and searching.

Thanks.
Chunhe

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to