At 5:12 PM -0400 7/1/08, Grant Ingersoll wrote: >You make a good point about the countless hours debugging. On the flip side, >one could ask the question as to whether the Solr schema is stable enough that >we should publish an XML Schema for it, thus helping alleviate some of the >pain.
That's a very good point: A lot of the internal code-based validation of the .xml configuration files could be obviated with parse-time validation, and using a well defined .xsd the schema itself would be user-extensible/restrictable. >More below... -- snip -- >This seems a bit clunky to me, syntax-wise, but the idea seems right. I >suppose another option is that I could just extend the FieldType and have it >look for my own attributes. Well for a specific field type there's already the init(...) method designed to allow subclasses to parse and remove attributes before the bad-argument test, e.g. as done in CompressibleField. Where this won't work without a user-extensible dictionary is if one wants a new attribute across all field types. I did, and so had to modify FieldType itself, which was a bit clunky in a different way. Either way, by adding a getAttribute to FieldType such as I described, it's only necessary for init (in FieldType or a subclass) to remove the argument from initArgs, so the attribute can be retrieved and parsed on demand rather than creating an instance variable to store it. But stepping back, is language-dependent analysis really the goal? As Erik Hatcher notes, there is this complication: >Further on this.... if metadata is added to a field type, it needs to somehow >make it down to the tokenizer and filter factories to use if desired. >Language, for example, could be attached to a field type, but then could be >leveraged by a stop word filter to pick up a language-specific stop word file. And perhaps what one perhaps really needs is not a static attribute added to the field type, but one that can vary across each document, e.g. via a different field's value or a payload affixed to the tokens. I remember a thread on payloads being used for that purpose (and I see you contributed to the Lucene-side design of payloads), but I don't recall whether it converged on a usable Solr-side implementation. >I'll have to think some more about it... Me too... the use-cases for the schema.xml-driven extension I proposed may be so rare that it's not at all worth considering. - J.J.
