On Feb 19, 2007, at 3:07 PM, Marvin Humphrey wrote:
On Feb 19, 2007, at 11:32 AM, Grant Ingersoll wrote:
FWIW, we support, in our in-house system and in addition to fixed
field semantics, completely dynamic field names for some
applications, but they have a fixed field type. So, the field
name can be anything, but the attributes of the field are fixed
(i.e. it will always be tokenized with norms). This is useful for
us, in some cases, when indexing XML files where the tag name
becomes the field name and the set of tag names are not known
ahead of time. I suppose there are ways around this (by
preprocessing all the files), but having the ability to add
arbitrary fields is a good thing for us and some of the
applications we do.
The thing I don't like about this is that it prevents validation of
field names, which is something I use a lot in KS (e.g. try to
delete a term from a field that's not indexed, get an error, as the
field name was probably misspelled). I can see the use, it just
means sacrificing a lot of type safety for the more common cases.
The user base at large has to suffer with more frequent, hard-to-
detect bugs for a feature only needed by a few users.
Since all our dynamically named fields are of the same type, it isn't
an issue for us at the moment. Then again, though, we only have in-
house users and don't have the same issue that you have.
About your app in particular -- how do you handle identical XML tag
names that mean totally different things when nested inside
different elements?
It doesn't happen. The tags are based on the output of some other
processes and are unique and the tag/field name has semantics
attached to it that is meaningful to the application. I suppose,
technically, they are known ahead of time, but there are potentially
hundreds of them such that it doesn't make sense to populate them
into our Field schema ahead of time as maintenance would be a nightmare.
<company>
<name>Acme</name>
</company>
<product>
<name>Widget</name>
</product>
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]