Hi,

I am looking for an efficient way to do inter-document queries in 
Elasticsearch. Specifically, I want to count the number of users that went 
through an exit point B after visiting point A.

In general terms, say we have some event log data about users actions on a 
website:
....
{"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":"US", 
"tstamp":"2013-04-01 00:01:01"}
{"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":"CN", 
"tstamp":"2013-04-01 00:02:11"}
{"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":"US", 
"tstamp":"2013-04-01 00:06:31"}
{"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":"DE", 
"tstamp":"2013-04-01 00:08:00"}
{"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":"CN", 
"tstamp":"2013-04-01 00:08:55"}
{"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":"US", 
"tstamp":"2013-04-01 00:09:46"}
{"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":"DE"
, "tstamp":"2013-04-01 00:10:46"}
....
And we have 500+M such entries.

We want a count of the number of userids that visited path=/sale/B after 
visiting path=/promo/A.

What I did is to preprocess the data, sorting by <userid, tstamp>, then 
compacting all events by the same userid into the same document. Then I 
wrote a script filter which traverses the path array per document, and 
returns true if it finds any occurrence of B followed by A. This however is 
inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This 
script filter query takes over 300 seconds. Specifically, it can process 
events at about 400K events per second. BY comparison, I wrote a naive 
program that does a linear pass of the un-compacted data and that process 
11M events per second. By which I conclude that Elasticsearch does not do 
well on this type of query.

I am hoping someone can indicate a more efficient way to do this query in 
ES. Or else confirm that ES cannot do inter-document queries well. 

Thanks,
Zennet


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to