The idea would be to use the online optimizer: first training the model on a whole day’s worth of data to establish a model foothold, finding anomalies within that first day. From then on minibatches would be brought in (near real time) to further train the model and evaluate the most recent anomalies. Do you have thoughts on this topic Giacomo? Are you hoping to contribute?
Brandon On 6/20/17, 10:01 AM, "Giacomo Bernardi" <[email protected]> wrote: Thanks. I wasn't referring to extra time based series, but to the topic modelling and anomaly detection itself. So, plan is to use OnlineLDAOptimizer with mini-batches of the last (few?) minutes, then? G. On 20 June 2017 at 17:45, Edwards, Brandon <[email protected]> wrote: > Giacomo, > Spark has an online optimizer for LDA which would enable the use of LDA in a mini-batch or streaming use case. However, if you are talking about machine learning that would look for anomalies that incorporate time-based features, we would like to explore this. It’s on the road map, but is not being worked on right now. We have thought of including new time based features into the LDA model, and/or training additional time series models to be included with LDA in a model-ensemble. > Brandon > > On 6/20/17, 8:58 AM, "Giacomo Bernardi" <[email protected]> wrote: > > Hi Brandon and all, > I'm resuming this thread to check whether any thought has already been > given to such "streaming use case". > > Are you planning of somehow using streaming-LDA in that case too? Or > something different (fancy RNNs? HTM?) to model the state of each IP? > > Thanks, > Giacomo > > > On 25 May 2017 at 18:27, Edwards, Brandon <[email protected]> wrote: > > > The Spot team feels that changes are needed to this ‘feedback’ > > functionality, and see these changes as happening concurrent with > > improvements to the ability for context from an LDA model trained on a given > > batch of data to be carried forward to the next training run (or even > > training in a streaming use case). The value of ‘feedback’ is dependent on > > the quality of the model-context we can carry over. > >
