[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464841#comment-16464841 ]
Abhishek Kumar Singh commented on SOLR-11741: --------------------------------------------- In order to use LearnSchemaUpdateRequestProcessorFactory, add just it to the URP chain. The new API details are :- # *_Get A Training Id:_*** *_GET_* *_/<corename>/schema/train/start_* Response: {code:java} {"schemaTrainingId" : "<new schema training id>"} {code} *2. Start Training:* This api is just like another update api, with documents to be trained with. *POST* */<corename>/update?schemaTrainingId=<trainingId>* {code:java} Body: (Same as update request) [{}] {code} *3. Get the schema trained so far:-* *GET* */schema/train/yield?schemaTrainingId=<currentTrainingId>* *Response:* {code:java} { "schema":{ "add-field-type": [ { "name":<fieldname1>, "type":<type>, "multivalued":<true/false>}, { "name":<fieldname2>, "type":<type>, "multivalued":<true/false>}, ... ] } } {code} ** > Offline training mode for schema guessing > ----------------------------------------- > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org