[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316913#comment-16316913 ]
Cassandra Targett commented on SOLR-11741: ------------------------------------------ bq. What i suggested at one point (I don't remember where ... it may already be in a jira somewhere?) was an UpdateRequestProcessorFactory that could be configured instead of RunUpdateProcessorFactory in a chain... The issue where Hoss mentioned this idea before was SOLR-6939. I linked it here for reference. > Offline training mode for schema guessing > ----------------------------------------- > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org