On Sat, Sep 18, 2010 at 8:52 PM, Marvin Humphrey <[email protected]> wrote: > Greets, > > Right now, KinoSearch's FieldType subclasses have certain properties enabled > by default. > > FullTextType: indexed, stored > StringType: indexed, stored > BlobType: stored > > Having those defaults made the most common use cases for building a Schema > slightly less verbose. For instance, in the following example, a couple lines > are not needed: > > my $schema = KinoSearch::Plan::Schema->new; > my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(language => 'en'); > my $type = KinoSearch::Plan::FullTextType->new( > indexed => 1, # <--------------- not needed > stored => 1, # <--------------- not needed > highlightable => 1, > analyzer => $analyzer, > ); > $schema->spec_field(name => 'title', type => $type); > $schema->spec_field(name => 'content', type => $type); > > However, I have come to believe that the advantages of succinctness do not > outweigh the disadvantages of inconsistency, and that it would be better to > have all properties default to false. huge +1 - consistency is crucial IMO > > If all properties default to false, then it becomes easier to understand at a > glance how a FieldType is configured, both when looking at code and when > examining the schema_NNN.json file. You don't need to take into account what > the FieldType's class is, nor inspect carefully for missing keys. > > Furthermore, by having all properties default to false, we can implement them > as bit-flags and have the C constructors for FieldType subclasses take a > "flags" integer which defaults to 0. I don't know if that is a really good usecase for flags integers though. For something high level as FieldType I would guess there is more than just boolean flags - maybe not now but in the future. I would want to remind you to distinguish between internal representation and the interface. I don't mind to have an efficient compact representation but for the interface that seems to be too specialized already. I have a whole bunch of ideas for FieldType since I work on something similar in lucene land and I am happy to share those ideas. Still need to think how far they apply to lucy. > > Analyzer *analyzer = (Analyzer*)Tokenizer_new(NULL); > uint32_t flags = (FType_INDEXED | FType_STORED | FType_HIGHLIGHTABLE); > TextType *type = TextType_new(analyzer, flags); > > If we change the defaults in Lucy, it will mean a back-compat break with > KinoSearch. However, we can minimize the disruption by consolidating > FullTextType and StringType into a single, new TextType class. Then, when > KinoSearch schema.json files are read and fieldtypes are detected which are > labeled "fulltext" or "string" instead of the new "text", we can just add the > flags and invoke TextType's constructor.
While I see your point I think we should not try to maintain bw compat to kino search. I had the impression that this is a fresh start please correct me if I am wrong. If we maintain BW compat (what a pain man!) then +1 > > Since numeric types are not public yet in KS, that leaves only BlobType, which > is rarely used. My thinking is that it probably makes sense to just break > back compat for BlobType. +1 Are we already that far to talk about something like Field Type? simon > > Marvin Humphrey > >
