Varun Thacker created SOLR-6327:
-----------------------------------
Summary: An UpdateProcessor to generate a best fit schema
Key: SOLR-6327
URL: https://issues.apache.org/jira/browse/SOLR-6327
Project: Solr
Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
We should have an UpdateProcessor which takes in documents and learns the types
from it to generate a best fit schema automatically.
Quoting Hoss - "You wouldn't need/want a handler for this – you'd just need an
UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would
look at the datatpes of the fields in each document w/o doing any indexing and
pick the least common denominator.
So then you'd have a chain with all of your normal update processors including
the TypeMapping processors configured with the preccedence orders and locales
and format strings you want – and at the end you'd have your
BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs,
study their values, and throw them away – until a commit comes along, at which
point it does all the under the hood schema field addition calls.
So do learn, you'd send docs using whatever handler/format you wnat (json, xml,
extraction, etc...) with an update.chain=my.datatype.learning.processor.chain
request param ... and once you've sent a bunch and giving it a lot of variety
to see, then you send a commit so it creates the schema and then you re-index
your docs for real w/o that special chain."
That discussion took place in SOLR-6016
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]