Roger, A basic time series construct is the "sliding" window in conjunction with sorted time/value data; A sample implementation is at my github:
https://github.com/jpatanooga/Caduceus/tree/master/src/tv/floe/caduceus/hadoop/movingaverage There are two jobs in there, one that uses the shuffle and one that does not --- to illustrate the difference. I have a blog draft coming that accompanies this code, I'll follow up and send you a copy draft of it. >From that code you should be able to build out a more complex time series / DSP process (using it as base code), something along the lines of a 1NN classifier: https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/ https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/docs/openPDC%20Datamining%20Tools%20Guide.pdf https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/src/TVA/Hadoop/MapReduce/Datamining/SAX/SlidingTSClassifier_kNN.java I'm in the process of updating that older openPDC code to be more modern and modular for general data sources. Josh On Sat, Mar 5, 2011 at 12:05 AM, Roger Smith <[email protected]> wrote: > All - > I wonder if any of you have integrated a DSP library with Hadoop. > We are considering using Hadoop to processing time series data, but don't > want to write standard DSP functions. > > Roger. > -- Twitter: @jpatanooga Solution Architect @ Cloudera hadoop: http://www.cloudera.com blog: http://jpatterson.floe.tv
