[
https://issues.apache.org/jira/browse/UNOMI-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844373#comment-16844373
]
Jonathan Kressaty commented on UNOMI-172:
-----------------------------------------
I'd like to +1 this. When deployed, my use of Unomi is going to expand to
millions of profiles in a matter of weeks/months. We're prepared for what that
entails from a resource perspective, but it will be crucial that we can apply
segments well into the future and have it apply to the entire system. Segments
will be used for audience building to determine what tests to run, and
potentially to export profiles from ES for use in 3rd party systems.
[~shuber] I don't follow on your final suggestion - you're saying use a rule to
add a new property value to a profile, and then build a segment that is a super
simple match to that property in order to make the query more efficient?
Thanks for taking a look at this!
> Better way to create large profile segments
> -------------------------------------------
>
> Key: UNOMI-172
> URL: https://issues.apache.org/jira/browse/UNOMI-172
> Project: Apache Unomi
> Issue Type: Improvement
> Reporter: Don Hinshaw
> Priority: Minor
>
> Right now the aggregateQueryBucketSize is the limit for how large a segment
> can be when it's created. New events coming in are added, but in order to
> create a large segment of profiles that already exist, it requires increasing
> the aggregateQueryBucketSize.
>
> I increased the bucketSize to 100,000 and it took ~10min to create the
> segment. It was a pastEventCondition on a 3 node cluster with good
> resources. Our needs go well beyond 100,000.
>
> I understand that queries of that size in ES should be paginated. Are you
> aware of a better way to achieve large segments of existing data or is this a
> limitation at the moment?
>
> I even looked into using batchProfileUpdate to add segments directly to the
> profile but the performance was the same or worse since the condition is the
> bottleneck.
>
> Any insight into this issue would be greatly appreciated.
>
> Thanks,
> Donnie
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)