[ 
https://issues.apache.org/jira/browse/METRON-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448876#comment-16448876
 ] 

ASF GitHub Bot commented on METRON-1526:
----------------------------------------

Github user justinleet commented on the issue:

    https://github.com/apache/metron/pull/995
  
    @merrimanr Let me replay my understanding to see if I'm on the right track.
    
    The problem we have is that we're returning fields that we can't reindex as 
a whole document when we run a glob query ("*"). In particular, the ones we've 
seen are the subfields of LatLon.  We can't reindex the _coordinate fields, but 
they come back in a search.
    
    These fields will come back if they are either
    * stored (which are returned normally),
    * docValues that aren't stored, which are returned in the case of a glob 
query per Solr docs:
      >Field values retrieved during search queries are typically returned from 
stored values. However, non-stored docValues fields will be also returned along 
with other stored fields when all fields (or pattern matching globs) are 
specified to be returned (e.g. “fl=*”)
    
    This is why setting the dynamic field solves the problem (it both makes 
them not stored and not docValues).
    
    Is this correct so far?
    
    So I dug the slightest bit into Lucene source for the Currency field (as a 
specific example of a nonproblematic field per your test).
    
    Here's a snippet of 
    
    ```
      private void createDynamicCurrencyField(String suffix, FieldType type) {
        String name = "*" + POLY_FIELD_SEPARATOR + suffix;
        Map<String, String> props = new HashMap<>();
        props.put("indexed", "true");
        props.put("stored", "false");
        props.put("multiValued", "false");
        props.put("omitNorms", "true");
        int p = SchemaField.calcProps(name, type, props);
        schema.registerDynamicFields(SchemaField.create(name, type, p, null));
      }
    
    ...
      @Override
      public void inform(IndexSchema schema) {
        this.schema = schema;
        createDynamicCurrencyField(FIELD_SUFFIX_CURRENCY,   fieldTypeCurrency);
        createDynamicCurrencyField(FIELD_SUFFIX_AMOUNT_RAW, fieldTypeAmountRaw);
      }
    ```
    
    What's interesting is that it appears to create an entirely new dynamic 
field, `*____currency` to catch everything under the hood.  This field is not 
stored and uses the default docValues, which is false.
    
    Output from a LukeRequest similar to the test above:
    ```
    KEY: *____currency
    VALUE NAME: *____currency
    FLAGS: [INDEXED, OMIT_NORMS]
    
    KEY: *____amount_raw
    VALUE NAME: *____amount_raw
    FLAGS: [INDEXED, OMIT_NORMS]
    
    KEY: *
    VALUE NAME: *
    FLAGS: [DOC_VALUES, OMIT_NORMS, OMIT_TF]
    
    KEY: *.c
    VALUE NAME: *.c
    FLAGS: [INDEXED, STORED, OMIT_TF]
    ```
    
    Note that only the catch all has the docValues flag, but the custom type's 
currency and amount_raw do not.
    
    The short version is that it seems like the majority of the default types 
manage their subfields more reasonably and therefore aren't a problem.
    



> Location field types cause DocValuesField appear more than once error
> ---------------------------------------------------------------------
>
>                 Key: METRON-1526
>                 URL: https://issues.apache.org/jira/browse/METRON-1526
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ryan Merriman
>            Assignee: Ryan Merriman
>            Priority: Major
>
> While testing [https://github.com/apache/metron/pull/970] I get this error 
> when creating a meta alert:
> {code:java}
> Error from server at http://10.0.2.15:8983/solr/bro: Exception writing 
> document id bbc150f5-92f8-485d-93cc-11730c1edf31 to the index; possible 
> analysis error: DocValuesField 
> \"enrichments.geo.ip_dst_addr.location_point_0_coordinate\" appears more than 
> once in this document (only one value is allowed per field){code}
> I tracked it down to the fact that multiple fields are returned for a 
> location field.  For example when a field named 
> "enrichments.geo.ip_dst_addr.location_point" is configured in a schema, these 
> fields are returned in a query:
> {code:java}
> {
> "enrichments.geo.ip_dst_addr.location_point_0_coordinate": "33.4499",
> "enrichments.geo.ip_dst_addr.location_point_1_coordinate": "-112.0712",
> "enrichments.geo.ip_dst_addr.location_point": "33.4499,-112.0712"
> }
> {code}
>  We need a way to either suppress these extra fields when querying or remove 
> them before updating a document. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to