Further on this.... if metadata is added to a field type, it needs to
somehow make it down to the tokenizer and filter factories to use if
desired. Language, for example, could be attached to a field type,
but then could be leveraged by a stop word filter to pick up a
language-specific stop word file.
Food for thought.
Erik
On Jun 30, 2008, at 3:28 PM, J.J. Larrea wrote:
I heartily agree with you Grant that these objects should be user-
extensible. But removing the exception test entirely would probably
be a great disservice to Solr users, who could spend untold hours
debugging problems in schema.xml (eg. misspelled or contextually
inappropriate properties) without the valuable feedback it
provides. So to do this right there should be a way to define
additional properties (defined as booleans in Solr) and attributes
(which can be string-valued).
Thinking aloud here...
For properties, something like this added to FieldProperties would
allow user-defined global properties:
final static int USER_DEFINED = 0x00010000;
static int nextIndex = USER_DEFINED;
static int addPropertyType(String prop) {
if( propertyMap.containsKey(prop) ) throw ...
if( nextIndex > 31 ) throw ...
i = nextIndex++;
propertyMap.put(prop, i);
return i;
}
Which could be enabled by parsing a new <fieldProperty name="..."/>
tag from schema.xml before any of the fieldType or field declarations.
For string-valued attributes, FieldType could be extended with a Set
of user-defined names (or name/type mappings?) which would be
removed from initArgs before the exception test. The values could
be returned by a trivial method
public String getAttribute(String name) {
return args.get(name);
}
so other code could repeatedly get access to them (initArgs are
progressively removed until the null set or error, but args persist)
without having to parse and store the value somewhere.
Simplest would be for the attribute name set to be global across all
field-types, with a static addAttributeType method and a
freestanding tag in schema.xml similar to the above for properties.
But one could argue for the set of user-defined attribute to be
local to a particular fieldType and all fields defined from it,
perhaps set from an XML attribute:
<!-- text fields have an attribute lang defaulting to 'american'
-->
<fieldType name="text" extra="lang" lang="american" ... />
<field name="Prenom" type="text" lang="french" ... />
Anyway, does this make sense and fit with what you were thinking of?
- J.J.
At 10:22 AM -0400 6/30/08, Grant Ingersoll wrote:
Currently, FieldType throws a RuntimeException if there are any
"extra" properties in the configuration. I think SchemaField does
something similar.
I'd like to consider not doing this. My main case is I want to be
able to store semantic information about the FieldType with the
FieldType. Doing this now, requires creating a whole separate
object model that overlays the FieldType and stores the information
elsewhere (i.e. DB). For example, say you want to denote what
language a given field type supports, one has to store this
information elsewhere, when it could easily be seen as a property
of the FieldType. I think right now, people often rely on naming
conventions to convey this, such as text_zh or text_it or something
like that and that doesn't extend very well, IMO. These new
attributes would allow applications to make use of richer semantics
for FieldType w/o harming Solr in anyway (I think.)
From the looks of it, FieldType has all the functionality already
built in, minus a few lines where the exception is thrown if there
are "extra" attributes.
I think a similar argument can be made for SchemaField as well (and
probably other things like RequestHandler, etc. but "baby steps"
first)
Any thoughts/objections?
-Grant