[ 
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266547#comment-14266547
 ] 

Erik Hatcher commented on SOLR-6913:
------------------------------------

bq. We need the auto-field guessing to also guess and set multiValued, seems 
like.

I gave this a whirl (patch below as code comment for posterity) and did not 
like it.  I did not like it because the data I tried always had one field that 
comes in as a single value (or even a single valued array in, say, JSON; that's 
indistinguishable from a single value at this update processor level it seems) 
in the first, or even more confusing after a handful of documents go in 
successfully, then multiple values start coming in.  A prime example of where 
"guessing" this stuff is, more often than not, incorrect or inappropriate (at 
least on a single field value sample size) somewhere along the way with real 
data.  It's easiest, I'll echo, to just assume multivalued on new fields.  No 
worries, this is why it's now been made easy to nudge these things with 
something as straightforward as this when setting things up:

{code}
curl http://localhost:8983/solr/films/schema/fields -X POST -H 
'Content-type:application/json' --data-binary '
[
    {
        "name":"name",
        "type":"text_general",
        "stored":true
    }
]'
{code}

{code}
Index: 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
===================================================================
--- 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
   (revision 1649842)
+++ 
core/src/java/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.java
   (working copy)
@@ -39,8 +39,10 @@
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
 import static org.apache.solr.common.SolrException.ErrorCode.BAD_REQUEST;
@@ -279,8 +281,16 @@
         FieldNameSelector selector = buildSelector(oldSchema);
         for (final String fieldName : doc.getFieldNames()) {
           if (selector.shouldMutate(fieldName)) { // returns false if the 
field already exists in the current schema
-            String fieldTypeName = 
mapValueClassesToFieldType(doc.getField(fieldName));
-            newFields.add(oldSchema.newField(fieldName, fieldTypeName, 
Collections.<String,Object>emptyMap()));
+            SolrInputField value = doc.getField(fieldName);
+            String fieldTypeName = mapValueClassesToFieldType(value);
+            Map<String,Object> options = new HashMap<>();
+
+            if (value.getValueCount() > 1) {
+              options.put("multiValued", true);
+            }
+
+            SchemaField newField = oldSchema.newField(fieldName, 
fieldTypeName, options);
+            newFields.add(newField);
           }
         }
         if (newFields.isEmpty()) {
{code}

> audit & cleanup "schema" in data_driven_schema_configs
> ------------------------------------------------------
>
>                 Key: SOLR-6913
>                 URL: https://issues.apache.org/jira/browse/SOLR-6913
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hoss Man
>            Assignee: Steve Rowe
>            Priority: Blocker
>             Fix For: 5.0
>
>         Attachments: SOLR-6913-trim-schema.patch
>
>
> the data_driven_schema_configs configset has some issues that should be 
> reviewed carefully & cleaned up...
> * currentkly includes a schema.xml file:
> ** this was previously pat of the old example to show the automatic 
> "bootstraping" of schema.xml -> managed-schema, but at this point it's just 
> kind of confusing
> ** we should just rename this to "managed-schema" in svn - the ref guide 
> explains the bootstraping
> * the effective schema as it currently stands includes a bunch of copyFields 
> & dynamicFields that are taken wholesale from the techproducts example
> ** some of these might make sense to keep in a general example (ie: "\*_txt") 
> but in general they should all be reviewed.
> ** a bunch of this cruft is actually commented out already, but anything we 
> don't want to keep should be removed to eliminate confusion
> * SOLR-6471 added an explicit "_text" field as the default and made it a 
> copyField catchall (ie: "\*")
> ** the ref guide schema API example responses need to reflect the existence 
> of this field: 
> https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
> ** we should draw heavy attention to this field+copyField -- both with a "/!\ 
> NOTE" in the refguide and call it out in solrconfig.xml & "managed-schema" 
> file comments since people who start with these configs may be suprised and 
> wind up with a very bloated index



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to