[
https://issues.apache.org/jira/browse/UNOMI-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Schiefer updated UNOMI-471:
-------------------------------
Description:
In the MergeProfilesOnPropertyAction.java
([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
There is a sessionReassigned event raised with the current event as the
"source" and the currentSession as the "target" arguments. This is problematic
due to the fact that these entities (as well as their nested child
profile/session/properties) are then ALL mapped as new fields in elastic.
If you can imagine a profile with 50 text attributes (with 2 mappings per
attribute, text and keyword) and 5 different consents (with 8 different
mappings per consent) - this would add an additional 140 mappings in the
current event index.
Now add the fact that the current event is sent as the source argument in the
sessionReassigned event, you will have every source/target/property of the
event that triggered the MergeProfilesOnProperty action additionally indexed as
part of the "source" property of the sessionReassigned event, easily leading to
hundreds more mappings in elasticsearch, and quickly hitting the default
mapping fields limit of 1000 set by elasticsearch.
Recommendation:
a: Only send the event.itemId and event.itemType as the "source", and only send
the currentSession.itemId and currentSession.itemType as the "target" when
creating the sessionReassigned event.
OR
b: specify the default mapping on the event index ahead of time, including only
properties that are necessary in source and target, and making those entities
dynamic: false (this solution seems less optimal than solution a)
was:
In the MergeProfilesOnPropertyAction.java
([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
There is a sessionReassigned event raised with the current event as the
"source" and the currentSession as the "target" arguments. This is problematic
due to the fact that these entities (as well as their nested child
profile/session/properties) are then ALL mapped as new fields in elastic.
If you can imagine a profile with 50 text attributes (with 2 mappings per
attribute, text and keyword) and 5 different consents (with 8 different
mappings per consent) - this would add an additional 140 mappings in the
current event index.
Now add the fact that the current event is sent as the source argument in the
sessionReassigned event, you will have every source/target/property of the
event that triggered the MergeProfilesOnProperty action additionally indexed as
part of the "source" property of the sessionReassigned event, easily leading to
hundreds more mappings in elasticsearch, and quickly hitting the default
mapping fields limit of 1000 set by elasticsearch.
Recommendation:
Only send the event.itemId and event.itemType as the "source", and only send
the currentSession.itemId and currentSession.itemType as the "target" when
creating the sessionReassigned event
> sessionReassigned event causes mapping explosion in elastic
> -----------------------------------------------------------
>
> Key: UNOMI-471
> URL: https://issues.apache.org/jira/browse/UNOMI-471
> Project: Apache Unomi
> Issue Type: Bug
> Components: core
> Affects Versions: 1.5.4
> Reporter: Ben Schiefer
> Priority: Critical
>
> In the MergeProfilesOnPropertyAction.java
> ([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
>
>
> There is a sessionReassigned event raised with the current event as the
> "source" and the currentSession as the "target" arguments. This is
> problematic due to the fact that these entities (as well as their nested
> child profile/session/properties) are then ALL mapped as new fields in
> elastic.
>
> If you can imagine a profile with 50 text attributes (with 2 mappings per
> attribute, text and keyword) and 5 different consents (with 8 different
> mappings per consent) - this would add an additional 140 mappings in the
> current event index.
>
> Now add the fact that the current event is sent as the source argument in the
> sessionReassigned event, you will have every source/target/property of the
> event that triggered the MergeProfilesOnProperty action additionally indexed
> as part of the "source" property of the sessionReassigned event, easily
> leading to hundreds more mappings in elasticsearch, and quickly hitting the
> default mapping fields limit of 1000 set by elasticsearch.
>
> Recommendation:
> a: Only send the event.itemId and event.itemType as the "source", and only
> send the currentSession.itemId and currentSession.itemType as the "target"
> when creating the sessionReassigned event.
> OR
> b: specify the default mapping on the event index ahead of time, including
> only properties that are necessary in source and target, and making those
> entities dynamic: false (this solution seems less optimal than solution a)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)