[ 
https://issues.apache.org/jira/browse/UNOMI-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Schiefer updated UNOMI-471:
-------------------------------
    Description: 
In the MergeProfilesOnPropertyAction.java 
([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
  

 

There is a sessionReassigned event raised with the current event as the 
"source" and the currentSession as the "target" arguments.  This is problematic 
due to the fact that these entities (as well as their nested child 
profile/session/properties) are then ALL mapped as new fields in elastic.

 

If you can imagine a profile with 50 text attributes (with 2 mappings per 
attribute, text and keyword) and 5 different consents (with 8 different 
mappings per consent) - this would add an additional 140 mappings in the 
current event index. 

 

Now add the fact that the current event is sent as the source argument in the 
sessionReassigned event, you will have every source/target/property of the 
event that triggered the MergeProfilesOnProperty action additionally indexed as 
part of the "source" property of the sessionReassigned event, easily leading to 
hundreds more mappings in elasticsearch, and quickly hitting the default 
mapping fields limit of 1000 set by elasticsearch.

 

Recommendation:

a: Only send the event.itemId and event.itemType as the "source", and only send 
the currentSession.itemId and currentSession.itemType as the "target" when 
creating the sessionReassigned event.

OR

b: specify the default mapping on the event index ahead of time, including only 
properties that are necessary in source and target, and making those entities 
dynamic: false  (this solution seems less optimal than solution a)
  

  was:
In the MergeProfilesOnPropertyAction.java 
([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
  

 

There is a sessionReassigned event raised with the current event as the 
"source" and the currentSession as the "target" arguments.  This is problematic 
due to the fact that these entities (as well as their nested child 
profile/session/properties) are then ALL mapped as new fields in elastic.

 

If you can imagine a profile with 50 text attributes (with 2 mappings per 
attribute, text and keyword) and 5 different consents (with 8 different 
mappings per consent) - this would add an additional 140 mappings in the 
current event index. 

 

Now add the fact that the current event is sent as the source argument in the 
sessionReassigned event, you will have every source/target/property of the 
event that triggered the MergeProfilesOnProperty action additionally indexed as 
part of the "source" property of the sessionReassigned event, easily leading to 
hundreds more mappings in elasticsearch, and quickly hitting the default 
mapping fields limit of 1000 set by elasticsearch.

 

Recommendation:

Only send the event.itemId and event.itemType as the "source", and only send 
the currentSession.itemId and currentSession.itemType as the "target" when 
creating the sessionReassigned event
 


> sessionReassigned event causes mapping explosion in elastic
> -----------------------------------------------------------
>
>                 Key: UNOMI-471
>                 URL: https://issues.apache.org/jira/browse/UNOMI-471
>             Project: Apache Unomi
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.5.4
>            Reporter: Ben Schiefer
>            Priority: Critical
>
> In the MergeProfilesOnPropertyAction.java 
> ([https://github.com/apache/unomi/blob/2baf16da141679b3f4fd12307b840982a4740592/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java#L122)]
>   
>  
> There is a sessionReassigned event raised with the current event as the 
> "source" and the currentSession as the "target" arguments.  This is 
> problematic due to the fact that these entities (as well as their nested 
> child profile/session/properties) are then ALL mapped as new fields in 
> elastic.
>  
> If you can imagine a profile with 50 text attributes (with 2 mappings per 
> attribute, text and keyword) and 5 different consents (with 8 different 
> mappings per consent) - this would add an additional 140 mappings in the 
> current event index. 
>  
> Now add the fact that the current event is sent as the source argument in the 
> sessionReassigned event, you will have every source/target/property of the 
> event that triggered the MergeProfilesOnProperty action additionally indexed 
> as part of the "source" property of the sessionReassigned event, easily 
> leading to hundreds more mappings in elasticsearch, and quickly hitting the 
> default mapping fields limit of 1000 set by elasticsearch.
>  
> Recommendation:
> a: Only send the event.itemId and event.itemType as the "source", and only 
> send the currentSession.itemId and currentSession.itemType as the "target" 
> when creating the sessionReassigned event.
> OR
> b: specify the default mapping on the event index ahead of time, including 
> only properties that are necessary in source and target, and making those 
> entities dynamic: false  (this solution seems less optimal than solution a)
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to