Hi, Tokenization depens whether an analyzer used for the field (non-primitive types) and the tokenization depends on which tokenizer is defined. Tokenizing a hostname doesn't really make sense with the default available tokenizers but you can use a KeywordTokenizer with a WordDelmiterFilter to split it into domains (TLD, SLD, etc). But having a TLD in the same field isn't very useful for boosting and query time analysis of search words - people usually don't search for a tld and if they do it should be boosted seperately.
About the Solr4 schema, it wasn't introduced as a Solr4 compatible version of the default schema.xml file and i think it should be removed in favour of updating the schema.xml to Solr4.The only change i can think of is adding the version field that is mandatory for SolrCloud. The schema version is 1.5 which the default schema already has. Cheers -----Original message----- > From:Lewis John Mcgibbney <[email protected]> > Sent: Tue 07-Aug-2012 00:03 > To: [email protected] > Subject: Re: Understanding mapping of field characteristics to index structure > > Mmmm... > > I think I opened a small can of worms here regarding consistency > between schema.xml and schema-solr4.xml. > > There are discrepancies between some fields as to their structural > characteristics. This is something which I think we should make > consistent between schemas... no? > > An example would be the content field (used in index-basic) which > appears as stored and indexed in schema-solr4.xml but not stored in > schema.xml > > Lewis > > On Mon, Aug 6, 2012 at 10:50 PM, Lewis John Mcgibbney > <[email protected]> wrote: > > Hi, > > > > Simple question but currently unclear to me... > > > > I know if a field e.g. 'host' is going to be stored and/or indexed as > > all I need to do is look this up or define it within my schema, > > however what about tokenised? This seems (to me anyway) to be shrouded > > in mystery :0| > > > > Any thoughts? Thank you > > > > Best > > Lewis > > > > -- > > Lewis > > > > -- > Lewis >

