I am not aware of anyone that does this for you directly, but it should not be 
too difficult for you to write what you want using pig or hive.  I am not as 
familiar with Jaql but I assume that you can do it there too.  Although it 
might be simpler to write it using Map/Reduce because we can abuse Map/Reduce 
in ways that the higher level languages disallow so that they can do 
optimizations.

What I would do is in the mapper scan through each entry and look for 
transitions of $value around $threshold, and the time that they occurred.  You 
can then look for 30+ second windows where $value > $threshold within that 
partition and output them to the reducer.  The trick with this is that you need 
to pay special attention to the beginning and end of the partition.  You need 
to also send to the reducer the state at the beginning and end of each 
partition and how long it was in that state.  The reducer can then combine 
these pieces together and see if they meet the 30+ second criteria. If so 
output them with the rest, otherwise don't.  The known times when it is > 30 
seconds can be sent to any reducer, so they can have any key, but for the 
transitions to work correctly you need to send them to a single reducer, so 
they should have a very specific key.  You could also try to divide them up if 
you have to scale very very large, but that would be rather difficult to get 
right.

--Bobby Evans


On 3/29/12 4:02 AM, "banermatt" <banerm...@hotmail.fr> wrote:



Hello,

I'm developping a log file anomaly detection system on an hadoop cluster.
I'm looking for a way to process query like: "select all values when
value>threshold for a duration>30 secondes". Do you know a tool which could
help me to process such a query?
I documented on the script langages pig, hive and jaql which seem to have
very similar application. I tried it but I was not be able to do what I
want.

Thank you in advance,

Matthieu

--
View this message in context: 
http://old.nabble.com/Temporal-query-tp33544869p33544869.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Reply via email to