Also check out Hadoop Rumen On Thu, Mar 29, 2012 at 10:22 AM, Tom Deutsch <tdeut...@us.ibm.com> wrote:
> Matthieu - you are welcome to contact me off list for assistance with Jaql. > > --------------------------------------- > Sent from my Blackberry so please excuse typing and spelling errors. > > > ----- Original Message ----- > From: Robert Evans [ev...@yahoo-inc.com] > Sent: 03/29/2012 10:09 AM EST > To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>; " > core-u...@hadoop.apache.org" <core-u...@hadoop.apache.org> > Subject: Re: Temporal query > > > > I am not aware of anyone that does this for you directly, but it should > not be too difficult for you to write what you want using pig or hive. I > am not as familiar with Jaql but I assume that you can do it there too. > Although it might be simpler to write it using Map/Reduce because we can > abuse Map/Reduce in ways that the higher level languages disallow so that > they can do optimizations. > > What I would do is in the mapper scan through each entry and look for > transitions of $value around $threshold, and the time that they occurred. > You can then look for 30+ second windows where $value > $threshold within > that partition and output them to the reducer. The trick with this is that > you need to pay special attention to the beginning and end of the > partition. You need to also send to the reducer the state at the beginning > and end of each partition and how long it was in that state. The reducer > can then combine these pieces together and see if they meet the 30+ second > criteria. If so output them with the rest, otherwise don't. The known > times when it is > 30 seconds can be sent to any reducer, so they can have > any key, but for the transitions to work correctly you need to send them to > a single reducer, so they should have a very specific key. You could also > try to divide them up if you have to scale very very large, but that would > be rather difficult to get right. > > --Bobby Evans > > > On 3/29/12 4:02 AM, "banermatt" <banerm...@hotmail.fr> wrote: > > > > Hello, > > I'm developping a log file anomaly detection system on an hadoop cluster. > I'm looking for a way to process query like: "select all values when > value>threshold for a duration>30 secondes". Do you know a tool which could > help me to process such a query? > I documented on the script langages pig, hive and jaql which seem to have > very similar application. I tried it but I was not be able to do what I > want. > > Thank you in advance, > > Matthieu > > -- > View this message in context: > http://old.nabble.com/Temporal-query-tp33544869p33544869.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > >