I am maintaining a years of user's activity including browse, purchase 
data. Each entry in browse/purchase is a json object:{item_id: id1, 
item_name, name1, category: c1, brand:b1, event_time: t1} .

I would like to compose different queries such like getting all customers 
who browsed item A, and or  purchased item B within time range t1 to t2. 
There are tens of millions customers.

My current design is to use nested object for each customer:
customer1:
       customer_id,id1,
       name: name1,
       country: US,
       browse: [{browseentry1_json},{browseentry2_json},...],
       purchase: [{purchase entry1_json},{purchase entry2_json},...]
      

With this design, I can easily compose all kinds of queries with nested 
query. The only problem is that it is hard to expire older browse/purchase 
data: I only wanna keep, for example, a years of browse/purchase data. In 
this design, I will have to at some point, read the entire index out, 
delete the expired browse/purchase data, and write them back.

Another design  is to use parent/child structure.
type: user is the parent of type browse and purchase.
type browse will contain each browse entry.
Although deleting old data seems easier with delete by query,  for the 
above query, I will have to do multiple and/or has_child queries,and it 
would be much less performant. In fact, initially i was using parent/child 
structure, but the query time seemed really long. I thus gave it up and 
tried to switch to nested object.

I am also thinking about using nested object, but break the data into 
different index(like monthly index) so that I can easily expire old data. 
The problem with this approach is that I have to query across those 
multiple indexes, and do aggregation on that to get the distinct users, 
which I assume will be much slower.(havn't tried yet). One requirement of 
this project is to be able to give the count of the queries in acceptable 
time frame.(like seconds) and I am afraid this approach may not be 
acceptable.

The ES cluster is 7 machines, each 8 cores and 32G memory.
Any suggestions? 

Thanks in advance!
Chen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1279e50-4ec7-4292-8ef3-49bc187498c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to