Yes, I guess the idea would be to train the model with augmented, healthy data. Then try to find anomalies in either raw test data or augmented test data, I'm not sure what would work better, that's up to test ;)
____________________________ BE THE CHANGE YOU WANT TO SEE IN THE WORLD ... > On 25 Oct 2015, at 15:52, Richard Crowder <[email protected]> wrote: > > Excellent, thanks Pascal. From initial reading of data augmentation, and > particularly the high-level descriptions of 'aggregation technique' and > 'probability technique', my first thoughts are; A HTM trained with SDRs of > 'good' data, then training turned off, could then discover anomalies in other > SDRs presented to it. About to delve further in data augmentation, but wanted > to throw out that thought (potentially wrong). > >> On Sun, Oct 25, 2015 at 2:43 PM, Pascal Weinberger >> <[email protected]> wrote: >> From the nupic.audio discussion in PR #21 thanks to Richards pointer: >> >> Ok, thanks :) I didn't notice this discussion :o Really interesting >> problem... In deep learning problems like this are usually solved by what is >> called data-augmentation, where you successively add different levels of >> noise and shifts to your data to a) get more training data and b) be more >> resilient to overfitting. I guess we could use a similar approach for this >> problem. on data-augmentation: >> https://www.techopedia.com/definition/28033/data-augmentation >> https://wwwf.imperial.ac.uk/~dvandyk/Research/01-jcgs-art.pdf >> http://jmlr.org/proceedings/papers/v38/gan15.pdf Or just Google ;D >> >> >> Best >> >> Pascal >> >> ____________________________ >> >> BE THE CHANGE YOU WANT TO SEE IN THE WORLD ... >> >> >>> On 25 Oct 2015, at 16:17, Richard Crowder <[email protected]> wrote: >>> >>> For reference; >>> http://www.intmath.com/blog/mathematics/math-of-ecgs-fourier-series-4281 >>> >>>> On Sun, Oct 25, 2015 at 2:16 PM, Richard Crowder <[email protected]> wrote: >>>> Also, covering Sergey's first question, what anomalies are you looking for? >>>> See this report that uses a Fourier Transform and it's inverse for R-peak >>>> detection; >>>> http://www.egr.msu.edu/classes/ece480/capstone/spring13/group03/documents/SignalProcessingofECGSignalsinMatlab.pdf >>>> >>>> PS: I have Matlab if required/helps. Although other ways in Python can be >>>> used, see nupic.critic for example. >>>> >>>>> On Sat, Oct 24, 2015 at 6:13 PM, Richard Crowder <[email protected]> wrote: >>>>> Kentaro, Sergey, >>>>> >>>>> I've been trying to get my head around available data for >>>>> training/testing. >>>>> >>>>> For example, https://physionet.org/physiobank/ Graph viewing and download >>>>> via >>>>> https://physionet.org/cgi-bin/atm/ATM?database=mimic2wdb&tool=plot_waveforms >>>>> (River view applicable?) >>>>> >>>>> Any idea what could be the best form of data, and which kind of data to >>>>> obtain (ECG only?, with arterial blood pressure, need for labeling?). See >>>>> graph here https://physionet.org/physiobank/database/mimic2wdb/ >>>>> >>>>> Best regards, Richard. >>>>> >>>>> >>>>>> On Thu, Oct 22, 2015 at 1:12 PM, 飯塚健太郎 <[email protected]> wrote: >>>>>> Richard, Sergey, >>>>>> Thank you for replies. >>>>>> >>>>>> I read replies carefully, and noticed some fact. >>>>>> >>>>>> Currently, My code using raw ECG data with NuPIC’s Scalar Encoder and >>>>>> TemporalAnomaly for inferenceType. >>>>>> >>>>>> But It is another way, >>>>>> use pre encoded ECG data to learn and predict anomalies. >>>>>> >>>>>> I found FFT used in Audio Stream example. >>>>>> https://github.com/numenta/nupic/blob/master/examples/audiostream/audiostream_tp.py#L249 >>>>>> >>>>>> It might be better to use Wavelet or another encoding technique, >>>>>> That technique make data more discretely and might be suitable for >>>>>> detect anomalies. >>>>>> >>>>>> I think I should learn about Encoding technique. >>>>>> I’ll read the paper Richard suggested, too. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> 2015-10-22 19:36 GMT+09:00 Richard Crowder <[email protected]>: >>>>>>> Hello Kentaro, >>>>>>> >>>>>>> Sergey's questions, response, and paper link are important. The linked >>>>>>> paper is the first I've read on ECG signal analysis, but has a lot of >>>>>>> cross-over with audio and speech signal analysis and recognition. Plus >>>>>>> recently research into steganalysis [1]. >>>>>>> >>>>>>> For example - >>>>>>> The use of Wavelet transform, or Fourier Transform / DCT (both >>>>>>> magnitude AND phase), >>>>>>> Perceptual linear prediction, as opposed to Mel-Frequency Cepstral >>>>>>> analysis, >>>>>>> Very importantly, statistical analysis of spectral features - >>>>>>> Wavelet/DCT with Hilbert transform, spectral envelope curve analysis >>>>>>> and derivative tracking (velocity and acceleration of curve changes, >>>>>>> can limit up to 5th order). >>>>>>> >>>>>>> A lot of this occurs within animal's brains, with mammals adding >>>>>>> addition feedback and inference through the neocortex. As humans, we >>>>>>> have exploited the spectral analysis within our 'old brain' to listen, >>>>>>> detect, and track spectral features. Such as ECG signals, and sonar >>>>>>> signals (hunting for shoals of fish and submarines), for example. >>>>>>> Cross-over and similar analysis occurs in vision sensory analysis too >>>>>>> (e.g. edge detection). >>>>>>> >>>>>>> Which points to the key questions of how you are encoding the ECG >>>>>>> signals? As well as classification techniques? >>>>>>> >>>>>>> Best regards, Richard. >>>>>>> >>>>>>> 1 http://www.shsu.edu/~qxl005/new/publications/tifs_audiosteg.pdf >>>>>>> >>>>>>>> On Thu, Oct 22, 2015 at 10:18 AM, Sergey Alexashenko >>>>>>>> <[email protected]> wrote: >>>>>>>> Actually, I can write out the scenarios here. >>>>>>>> >>>>>>>> NuPIC should definitely be able to learn different people's heartbeats >>>>>>>> in one model. You have to give it plenty of data to learn on. Also, >>>>>>>> make sure to resetSequenceStates every time you start feeding in data >>>>>>>> from a new person. Finally, you might want to shuffle the data so that >>>>>>>> you don't feed it person 1, then person 2, then person 3, but rather a >>>>>>>> mixture of all the data to reduce bias towards the latest people (but >>>>>>>> I don't think that this is necessary to be honest). >>>>>>>> >>>>>>>> There is, however, the issue of encoding. I'm assuming that you are >>>>>>>> using a scalar encoder produced by swarming. That's fine, that's a >>>>>>>> quick approach and it might work (in fact I would bet that it will >>>>>>>> produce usable results - be mindful of swarming on a data set >>>>>>>> including different people's data, though!). >>>>>>>> >>>>>>>> However, if you think about the data type - ECG data, unlike, say, EEG >>>>>>>> data, consists of almost perfectly discrete steps (heartbeats) which >>>>>>>> could be matched to NuPIC timesteps very well. If you run through the >>>>>>>> trouble of extracting features from your data (there is ample >>>>>>>> literature on how to do it - see [1] for example), and creating >>>>>>>> encoders for all the intervals/amplitudes, I think that NuPIC would do >>>>>>>> a marvelous job. Note that this approach condenses the time interval >>>>>>>> per step to one per heartbeat and, thus, is not going to work if you >>>>>>>> are trying to do super-rapid detection or prediction (on a time scale >>>>>>>> shorter than one heartbeat). It is also more time-consuming for you - >>>>>>>> once again, swarming could work well enough. >>>>>>>> >>>>>>>> Hope this helps, >>>>>>>> >>>>>>>> Sergey >>>>>>>> >>>>>>>> [1] http://arxiv.org/pdf/1005.0957.pdf >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, Oct 22, 2015 at 1:58 AM, Sergey Alexashenko >>>>>>>>> <[email protected]> wrote: >>>>>>>>> Hello Kentaro, >>>>>>>>> >>>>>>>>> I think that NuPIC can definitely work with ECG data, but I need a >>>>>>>>> little more information about your project to make any helpful >>>>>>>>> suggestions. Two questions: >>>>>>>>> >>>>>>>>> 1) Are you trying to predict or detect anomalies? You use both terms, >>>>>>>>> but they involve somewhat different mechanisms. >>>>>>>>> >>>>>>>>> 2) How are you encoding ECG data? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Sergey >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wed, Oct 21, 2015 at 10:07 PM, Kentaro Iizuka >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Hello NuPIC. >>>>>>>>>> >>>>>>>>>> Thank you Matt for post. >>>>>>>>>> >>>>>>>>>> Here is my question detail. (It is same as gitter post) >>>>>>>>>> https://gist.github.com/iizukak/72526863d3f504f2ff5e >>>>>>>>>> >>>>>>>>>> I hope somebody have good idea for that. >>>>>>>>>> >>>>>>>>>> Thank you! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2015-10-22 13:29 GMT+09:00 Matthew Taylor <[email protected]>: >>>>>>>>>> > Hello NuPIC, >>>>>>>>>> > >>>>>>>>>> > Check this out: >>>>>>>>>> > https://gitter.im/numenta/htm-challenge/archives/2015/10/21 >>>>>>>>>> > >>>>>>>>>> > Watch the ECG anomaly in the video: >>>>>>>>>> > https://youtu.be/5KdwV-trMhE?t=1m41s >>>>>>>>>> > >>>>>>>>>> > He has an interesting question about how to train a model on a >>>>>>>>>> > healthy >>>>>>>>>> > heartbeat, and it is expressed well with pictures in the link >>>>>>>>>> > above. He >>>>>>>>>> > wants to train a model with the ECG history of more than one >>>>>>>>>> > person to get a >>>>>>>>>> > representation of a "healthy heartbeat". The problem is that every >>>>>>>>>> > person's >>>>>>>>>> > heartbeat is a little different. Is it feasible to train a model >>>>>>>>>> > on multiple >>>>>>>>>> > heartbeats in sequence? I'm not sure if it will work, but maybe >>>>>>>>>> > someone has >>>>>>>>>> > a better idea? >>>>>>>>>> > >>>>>>>>>> > Solving this problem would help in a lot of different signal >>>>>>>>>> > analysis >>>>>>>>>> > applications of HTM... >>>>>>>>>> > >>>>>>>>>> > --------- >>>>>>>>>> > Matt Taylor >>>>>>>>>> > OS Community Flag-Bearer >>>>>>>>>> > Numenta >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Kentaro Iizuka<[email protected]> >>>>>>>>>> >>>>>>>>>> Github >>>>>>>>>> https://github.com/iizukak/ >>>>>>>>>> >>>>>>>>>> Facebook >>>>>>>>>> https://www.facebook.com/kentaroiizuka >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> 飯塚健太郎([email protected]) >>>>>> >>>>>> 埼玉大学理工学研究科 >>>>>> 暗号基盤研究室 >>>>>> 博士前期課程一年次 >
