[jira] [Created] (SOLR-6327) An UpdateProcessor to generate a best fit schema

Varun Thacker (JIRA) Wed, 06 Aug 2014 06:15:34 -0700

Varun Thacker created SOLR-6327:
-----------------------------------

             Summary: An UpdateProcessor to generate a best fit schema
                 Key: SOLR-6327
                 URL: https://issues.apache.org/jira/browse/SOLR-6327
             Project: Solr
          Issue Type: Improvement
            Reporter: Varun Thacker
            Priority: Minor



We should have an UpdateProcessor which takes in documents and learns the types 
from it to generate a best fit schema automatically.

Quoting Hoss - "You wouldn't need/want a handler for this – you'd just need an 
UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would 
look at the datatpes of the fields in each document w/o doing any indexing and 
pick the least common denominator.
So then you'd have a chain with all of your normal update processors including 
the TypeMapping processors configured with the preccedence orders and locales 
and format strings you want – and at the end you'd have your 
BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs, 
study their values, and throw them away – until a commit comes along, at which 
point it does all the under the hood schema field addition calls.
So do learn, you'd send docs using whatever handler/format you wnat (json, xml, 
extraction, etc...) with an update.chain=my.datatype.learning.processor.chain 
request param ... and once you've sent a bunch and giving it a lot of variety 
to see, then you send a commit so it creates the schema and then you re-index 
your docs for real w/o that special chain."

That discussion took place in SOLR-6016



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-6327) An UpdateProcessor to generate a best fit schema

Reply via email to