Hey Ken, all, I'm getting some thoughts down on this.. Here's a start, more to come later. Hopefully in time to help you with the challenge event, or at least for others to help out..
Electrocardiography ([1] ECG, EKG) recordings alongside temporal memories, is a great way to introduce a variety of sensor signal analysis techniques. With ECG signals having known physiological limits and being a fairly repetitive signal. The frequency beating of the heart ranging from 0 beats per minute (cardiac flatline), to 'on average' an upper limit of 220-age (maximum heart rate (MHR) for humans [2], Haskell and Fox). Although 206.9 - (0.67 x age) is generally a more accepted formula (derived from Tanaka, Monahan, & Seals). Digital analysis of continuous time signals can quickly become quite mathematically complicated. With other factors such as limitations of memory size. Leading to various techniques to overcome problems sampling a continuous time varying signal. And representational transformations to the sampled discrete signal to determine features of interest. With ECG signals we can take some short cuts through the math complexities, taking advantage of the heart and how it's beating is recorded. Continuous Signal Sampling -------------------------- For Ken's HTM Challenge we have to hope that digital sampling is at a high enough rate. Usually very linear and regular sampling. With enough regularity to capture important temporal transitions in the signal. This relates to Nyquist–Shannon sampling theorem [3], and others such as Whittaker. https://github.com/unpingco/Python-for-Signal-Processing/blob/master/Sampling_Theorem.ipynb Signal Window Considerations ---------------------------- Choice of window function for reducing the continuous time-varying signal to a more manageable size if VERY VERY important (often overlooked). Various spectral leakage, and other badness, can occur and set back work on signal feature analysis. For detailed look at window functions [4] see - Harris, Fredric J. "On the use of windows for harmonic analysis with the discrete Fourier transform." Proceedings of the IEEE 66.1 (1978): 51-83 https://github.com/unpingco/Python-for-Signal-Processing/blob/master/Windowing.ipynb Discrete Fourier Transform (DFT) -------------------------------- With the regular di-dum..di-dum..di-dum.. beating of the heart, and subsequent digitally sampled signal, we can use the simpler Fourier Transform to obtain a different representation of the signal. Other transforms, such as Wavelet based, could be used. But most are familiar with the Fourier one, and being able to use data restricted versions called fast Fourier Transform (depends on signal sampling factors, power of 2). This transform places the signal into the frequency domain. And for repetitive signals is great for picking out features in the signal. [5] 1 http://www.ivline.org/2010/05/quick-guide-to-ecg.html 2 https://en.wikipedia.org/wiki/Heart_rate#Maximum_heart_rate 3 https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem 4 https://en.wikipedia.org/wiki/Window_function 5 https://en.wikipedia.org/wiki/Discrete_Fourier_transform On Sun, Oct 25, 2015 at 3:18 PM, Pascal Weinberger < [email protected]> wrote: > Yes, I guess the idea would be to train the model with augmented, healthy > data. Then try to find anomalies in either raw test data or augmented test > data, I'm not sure what would work better, that's up to test ;) > > ____________________________ > > BE THE CHANGE YOU WANT TO SEE IN THE WORLD ... > > > On 25 Oct 2015, at 15:52, Richard Crowder <[email protected]> wrote: > > Excellent, thanks Pascal. From initial reading of data augmentation, and > particularly the high-level descriptions of 'aggregation technique' and > 'probability technique', my first thoughts are; A HTM trained with SDRs of > 'good' data, then training turned off, could then discover anomalies in > other SDRs presented to it. About to delve further in data augmentation, > but wanted to throw out that thought (potentially wrong). > > On Sun, Oct 25, 2015 at 2:43 PM, Pascal Weinberger < > [email protected]> wrote: > >> From the nupic.audio discussion in PR #21 thanks to Richards pointer: >> >> Ok, thanks :) I didn't notice this discussion :o Really interesting >> problem... In deep learning problems like this are usually solved by what >> is called data-augmentation, where you successively add different levels of >> noise and shifts to your data to a) get more training data and b) be more >> resilient to overfitting. I guess we could use a similar approach for this >> problem. on data-augmentation: >> https://www.techopedia.com/definition/28033/data-augmentation >> https://wwwf.imperial.ac.uk/~dvandyk/Research/01-jcgs-art.pdf >> http://jmlr.org/proceedings/papers/v38/gan15.pdf Or just Google ;D >> >> >> Best >> >> Pascal >> >> ____________________________ >> >> BE THE CHANGE YOU WANT TO SEE IN THE WORLD ... >> >> >> On 25 Oct 2015, at 16:17, Richard Crowder <[email protected]> wrote: >> >> For reference; >> http://www.intmath.com/blog/mathematics/math-of-ecgs-fourier-series-4281 >> >> On Sun, Oct 25, 2015 at 2:16 PM, Richard Crowder <[email protected]> >> wrote: >> >>> Also, covering Sergey's first question, what anomalies are you looking >>> for? >>> See this report that uses a Fourier Transform and it's inverse for >>> R-peak detection; >>> >>> http://www.egr.msu.edu/classes/ece480/capstone/spring13/group03/documents/SignalProcessingofECGSignalsinMatlab.pdf >>> >>> PS: I have Matlab if required/helps. Although other ways in Python can >>> be used, see nupic.critic for example. >>> >>> On Sat, Oct 24, 2015 at 6:13 PM, Richard Crowder <[email protected]> >>> wrote: >>> >>>> Kentaro, Sergey, >>>> >>>> I've been trying to get my head around available data for >>>> training/testing. >>>> >>>> For example, https://physionet.org/physiobank/ Graph viewing and >>>> download via >>>> https://physionet.org/cgi-bin/atm/ATM?database=mimic2wdb&tool=plot_waveforms >>>> (River >>>> view applicable?) >>>> >>>> Any idea what could be the best form of data, and which kind of data to >>>> obtain (ECG only?, with arterial blood pressure, need for labeling?). See >>>> graph here https://physionet.org/physiobank/database/mimic2wdb/ >>>> >>>> Best regards, Richard. >>>> >>>> >>>> On Thu, Oct 22, 2015 at 1:12 PM, 飯塚健太郎 <[email protected]> >>>> wrote: >>>> >>>>> Richard, Sergey, >>>>> Thank you for replies. >>>>> >>>>> I read replies carefully, and noticed some fact. >>>>> >>>>> Currently, My code using raw ECG data with NuPIC’s Scalar Encoder and >>>>> TemporalAnomaly for inferenceType. >>>>> >>>>> But It is another way, >>>>> use pre encoded ECG data to learn and predict anomalies. >>>>> >>>>> I found FFT used in Audio Stream example. >>>>> >>>>> https://github.com/numenta/nupic/blob/master/examples/audiostream/audiostream_tp.py#L249 >>>>> >>>>> It might be better to use Wavelet or another encoding technique, >>>>> That technique make data more discretely and might be suitable for >>>>> detect anomalies. >>>>> >>>>> I think I should learn about Encoding technique. >>>>> I’ll read the paper Richard suggested, too. >>>>> >>>>> Thanks! >>>>> >>>>> 2015-10-22 19:36 GMT+09:00 Richard Crowder <[email protected]>: >>>>> >>>>>> Hello Kentaro, >>>>>> >>>>>> Sergey's questions, response, and paper link are important. The >>>>>> linked paper is the first I've read on ECG signal analysis, but has a lot >>>>>> of cross-over with audio and speech signal analysis and recognition. Plus >>>>>> recently research into steganalysis [1]. >>>>>> >>>>>> For example - >>>>>> The use of Wavelet transform, or Fourier Transform / DCT (both >>>>>> magnitude AND phase), >>>>>> Perceptual linear prediction, as opposed to Mel-Frequency Cepstral >>>>>> analysis, >>>>>> Very importantly, statistical analysis of spectral features - >>>>>> Wavelet/DCT with Hilbert transform, spectral envelope curve analysis and >>>>>> derivative tracking (velocity and acceleration of curve changes, can >>>>>> limit >>>>>> up to 5th order). >>>>>> >>>>>> A lot of this occurs within animal's brains, with mammals adding >>>>>> addition feedback and inference through the neocortex. As humans, we have >>>>>> exploited the spectral analysis within our 'old brain' to listen, detect, >>>>>> and track spectral features. Such as ECG signals, and sonar signals >>>>>> (hunting for shoals of fish and submarines), for example. Cross-over and >>>>>> similar analysis occurs in vision sensory analysis too (e.g. edge >>>>>> detection). >>>>>> >>>>>> Which points to the key questions of how you are encoding the ECG >>>>>> signals? As well as classification techniques? >>>>>> >>>>>> Best regards, Richard. >>>>>> >>>>>> 1 http://www.shsu.edu/~qxl005/new/publications/tifs_audiosteg.pdf >>>>>> >>>>>> On Thu, Oct 22, 2015 at 10:18 AM, Sergey Alexashenko < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Actually, I can write out the scenarios here. >>>>>>> >>>>>>> NuPIC should definitely be able to learn different people's >>>>>>> heartbeats in one model. You have to give it plenty of data to learn on. >>>>>>> Also, make sure to resetSequenceStates every time you start feeding in >>>>>>> data >>>>>>> from a new person. Finally, you might want to shuffle the data so that >>>>>>> you >>>>>>> don't feed it person 1, then person 2, then person 3, but rather a >>>>>>> mixture >>>>>>> of all the data to reduce bias towards the latest people (but I don't >>>>>>> think >>>>>>> that this is necessary to be honest). >>>>>>> >>>>>>> There is, however, the issue of encoding. I'm assuming that you are >>>>>>> using a scalar encoder produced by swarming. That's fine, that's a quick >>>>>>> approach and it might work (in fact I would bet that it will produce >>>>>>> usable >>>>>>> results - be mindful of swarming on a data set including different >>>>>>> people's >>>>>>> data, though!). >>>>>>> >>>>>>> However, if you think about the data type - ECG data, unlike, say, >>>>>>> EEG data, consists of almost perfectly discrete steps (heartbeats) which >>>>>>> could be matched to NuPIC timesteps very well. If you run through the >>>>>>> trouble of extracting features from your data (there is ample >>>>>>> literature on >>>>>>> how to do it - see [1] for example), and creating encoders for all the >>>>>>> intervals/amplitudes, I think that NuPIC would do a marvelous job. Note >>>>>>> that this approach condenses the time interval per step to one per >>>>>>> heartbeat and, thus, is not going to work if you are trying to do >>>>>>> super-rapid detection or prediction (on a time scale shorter than one >>>>>>> heartbeat). It is also more time-consuming for you - once again, >>>>>>> swarming >>>>>>> could work well enough. >>>>>>> >>>>>>> Hope this helps, >>>>>>> >>>>>>> Sergey >>>>>>> >>>>>>> [1] http://arxiv.org/pdf/1005.0957.pdf >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 22, 2015 at 1:58 AM, Sergey Alexashenko < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello Kentaro, >>>>>>>> >>>>>>>> I think that NuPIC can definitely work with ECG data, but I need a >>>>>>>> little more information about your project to make any helpful >>>>>>>> suggestions. >>>>>>>> Two questions: >>>>>>>> >>>>>>>> 1) Are you trying to predict or detect anomalies? You use both >>>>>>>> terms, but they involve somewhat different mechanisms. >>>>>>>> >>>>>>>> 2) How are you encoding ECG data? >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Sergey >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 21, 2015 at 10:07 PM, Kentaro Iizuka < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hello NuPIC. >>>>>>>>> >>>>>>>>> Thank you Matt for post. >>>>>>>>> >>>>>>>>> Here is my question detail. (It is same as gitter post) >>>>>>>>> https://gist.github.com/iizukak/72526863d3f504f2ff5e >>>>>>>>> >>>>>>>>> I hope somebody have good idea for that. >>>>>>>>> >>>>>>>>> Thank you! >>>>>>>>> >>>>>>>>> >>>>>>>>> 2015-10-22 13:29 GMT+09:00 Matthew Taylor <[email protected]>: >>>>>>>>> > Hello NuPIC, >>>>>>>>> > >>>>>>>>> > Check this out: >>>>>>>>> https://gitter.im/numenta/htm-challenge/archives/2015/10/21 >>>>>>>>> > >>>>>>>>> > Watch the ECG anomaly in the video: >>>>>>>>> https://youtu.be/5KdwV-trMhE?t=1m41s >>>>>>>>> > >>>>>>>>> > He has an interesting question about how to train a model on a >>>>>>>>> healthy >>>>>>>>> > heartbeat, and it is expressed well with pictures in the link >>>>>>>>> above. He >>>>>>>>> > wants to train a model with the ECG history of more than one >>>>>>>>> person to get a >>>>>>>>> > representation of a "healthy heartbeat". The problem is that >>>>>>>>> every person's >>>>>>>>> > heartbeat is a little different. Is it feasible to train a model >>>>>>>>> on multiple >>>>>>>>> > heartbeats in sequence? I'm not sure if it will work, but maybe >>>>>>>>> someone has >>>>>>>>> > a better idea? >>>>>>>>> > >>>>>>>>> > Solving this problem would help in a lot of different signal >>>>>>>>> analysis >>>>>>>>> > applications of HTM... >>>>>>>>> > >>>>>>>>> > --------- >>>>>>>>> > Matt Taylor >>>>>>>>> > OS Community Flag-Bearer >>>>>>>>> > Numenta >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Kentaro Iizuka<[email protected]> >>>>>>>>> >>>>>>>>> Github >>>>>>>>> https://github.com/iizukak/ >>>>>>>>> >>>>>>>>> Facebook >>>>>>>>> https://www.facebook.com/kentaroiizuka >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> 飯塚健太郎([email protected]) >>>>> >>>>> 埼玉大学理工学研究科 >>>>> 暗号基盤研究室 >>>>> 博士前期課程一年次 >>>>> >>>> >>>> >>> >> >
