[
https://issues.apache.org/jira/browse/METRON-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450042#comment-16450042
]
ASF GitHub Bot commented on METRON-1526:
----------------------------------------
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
> So, from the last few examples discussed it suggests to me that being a
polyfield is actually a problem, but it's only part of the total reason for the
problem. Under the hood they are setting stored=false and docValues=false for
currency and some of the other polyfields, whereas they aren't doing this for
LatLonType and Point. Does that sound about right? I saw mention of Date also,
and some comments about LatLonType being the only problem data type, so it
would be good to summarize again what field types are specifically a problem.
Yes that sounds right to me. The issue with Date was actually a result of
a copy field being defined with stored set to true. The only types that
returned extra fields in my testing were LatLonType and PointType.
> One bit I'm not completely clear on after the rounds of discussion is
when it's desirable to return the dynamic/subfields generated by virtue of
being a polyfield, and further how the doc update code is managing that
differently from a normal search. For a standard doc query in context of
performing an update, we don't want dynamically generated fields returned
because that will bork the re-index. For a user glob query, we do want then
returned because they may be useful to some users? Is that hovering around
accurate?
The root of the problem is that updates in our DAOs involve reindexing the
whole document. If internal fields are returned in the lookup it will bork the
re-index. With a couple exceptions, document lookups are done with the
SolrClient.getById method which I believe returns all fields. I don't know
that there is a case where we want to return these dynamically generated
fields. In fact, I'm now convinced the LatLonType field was just not
configured correctly and is an isolated issue.
> @merrimanr Are there any circumstances where we'd want copy fields? It
seems like we could just direct users to make a copy of the field in Metron and
just index the copy fields directly. The main concern would be if we're able to
manage globs in there, but even if we don't it doesn't seem like super
important functionality to support. At least at a glance; I may definitely be
missing use cases.
I believe copy fields is a commonly used feature in Solr. It can be useful
for indexing a value in different ways, with a different chain of analyzers.
> Location field types cause DocValuesField appear more than once error
> ---------------------------------------------------------------------
>
> Key: METRON-1526
> URL: https://issues.apache.org/jira/browse/METRON-1526
> Project: Metron
> Issue Type: Bug
> Reporter: Ryan Merriman
> Assignee: Ryan Merriman
> Priority: Major
>
> While testing [https://github.com/apache/metron/pull/970] I get this error
> when creating a meta alert:
> {code:java}
> Error from server at http://10.0.2.15:8983/solr/bro: Exception writing
> document id bbc150f5-92f8-485d-93cc-11730c1edf31 to the index; possible
> analysis error: DocValuesField
> \"enrichments.geo.ip_dst_addr.location_point_0_coordinate\" appears more than
> once in this document (only one value is allowed per field){code}
> I tracked it down to the fact that multiple fields are returned for a
> location field. For example when a field named
> "enrichments.geo.ip_dst_addr.location_point" is configured in a schema, these
> fields are returned in a query:
> {code:java}
> {
> "enrichments.geo.ip_dst_addr.location_point_0_coordinate": "33.4499",
> "enrichments.geo.ip_dst_addr.location_point_1_coordinate": "-112.0712",
> "enrichments.geo.ip_dst_addr.location_point": "33.4499,-112.0712"
> }
> {code}
> We need a way to either suppress these extra fields when querying or remove
> them before updating a document.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)