[ 
https://issues.apache.org/jira/browse/UNOMI-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844598#comment-16844598
 ] 

Serge Huber commented on UNOMI-172:
-----------------------------------

Hello Jonathan, 

The aggregateQueryBucketSize limit applies only applies when you are building 
segments using condition that query past events. For example: give me all the 
profiles that have view a page in the last 30 days. The way you can get around 
this is to simply use a rule to set a profile property with the last page view 
date and then build a segment using that property. This way you can get around 
this limit.

The problem with the limit is due to a sort a "join" that is done when using 
pastEventConditons. In order to execute the join Unomi has to perform multiple 
ElasticSearch queries and feed the result of the first into the query for the 
second, with all the profileIds that match. Solving this is a hard problem, and 
this is some of the most difficult problems in database optimization and 
usually requires implementing some sort of query optimizer and statistics to 
predict the query execution in order for the query to execute optimally and 
with minimal resources.

Using the rule solution makes it easy to scale to millions or even hundreds of 
millions of profiles since a single ElasticSearch query on profiles would be 
executed.

Regards,
  Serge... 

> Better way to create large profile segments
> -------------------------------------------
>
>                 Key: UNOMI-172
>                 URL: https://issues.apache.org/jira/browse/UNOMI-172
>             Project: Apache Unomi
>          Issue Type: Improvement
>            Reporter: Don Hinshaw
>            Priority: Minor
>
> Right now the aggregateQueryBucketSize is the limit for how large a segment 
> can be when it's created.  New events coming in are added, but in order to 
> create a large segment of profiles that already exist, it requires increasing 
> the aggregateQueryBucketSize.  
>  
> I increased the bucketSize to 100,000 and it took ~10min to create the 
> segment.  It was a pastEventCondition on a 3 node cluster with good 
> resources.  Our needs go well beyond 100,000.
>  
> I understand that queries of that size in ES should be paginated.  Are you 
> aware of a better way to achieve large segments of existing data or is this a 
> limitation at the moment?
>  
> I even looked into using batchProfileUpdate to add segments directly to the 
> profile but the performance was the same or worse since the condition is the 
> bottleneck.
>  
> Any insight into this issue would be greatly appreciated.
>  
> Thanks,
> Donnie



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to