Thanks for the link, it looks pretty interesting. How long are your filters typically? I guess there is no need for frequency domain processing if the filters are fairly short.
Out of interest do you see any need for 2D filtering? On Dec 7, 2011, at 9:52 AM, Josh Patterson <[email protected]> wrote: > We did that with the openPDC classifications system where we broke up > high resolution PMU/sensor data into "blocks of time + sensor id" > buckets, with some overlap. > > code at: http://openpdc.codeplex.com > > The Cloudera article is just a basic example illustrating the > secondary sort mechanic, which is key for time series on hadoop (sort > for free). > > The openPDC has one MR job that scans time series for fuzzy patterns > using Keogh's SAX/iSAX technique and a 1NN classifier based on a > BallTree. > > Josh > > On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon > <[email protected]> wrote: >> If the data series is large it might be interesting to further split the job >> over time using overlap/add or overlap/save, or even an FFT suitably >> partitioned. >> >> On Dec 6, 2011, at 1:48 PM, Josh Patterson <[email protected]> wrote: >> >>> Mahout currently does not have, afaik, much/any time series specific >>> code for it. If I were to point someone at some good resources I'd >>> start wtih: >>> >>> - Box and Jenkins book >>> - Dr Keogh's line of research on time series pattern matching >>> >>> And then beyond that it begins to become "what are you specifically >>> looking for?". R is typically the "go to" resource for a lot of time >>> series work, but there has been some very successful work with Hadoop >>> and large scale time series data. Below I link to a few articles where >>> time series techniques are demonstrated with Hadoop. Specifically here >>> is a blog article on general time series processing with Hadoop: >>> >>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/ >>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/ >>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/ >>> >>> Beyond that you could take a look at how we applied these concepts to >>> the US powergrid PMU / smartgrid data back in 2009: >>> >>> http://openpdc.codeplex.com >>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard >>> >>> Hope that gets you going, >>> >>> Josh >>> >>> 2011/12/4 myn <[email protected]>: >>>> does mahout contain this method? >>>> or is there any other open soure projcet about this? >>> >>> >>> >>> -- >>> Twitter: @jpatanooga >>> Solution Architect @ Cloudera >>> hadoop: http://www.cloudera.com > > > > -- > Twitter: @jpatanooga > Solution Architect @ Cloudera > hadoop: http://www.cloudera.com
