I don't believe this list permits attachments Brandon. Perhaps post it to google docs and send out a link?
Alan On Thu, May 25, 2017 at 10:27 AM, Edwards, Brandon < [email protected]> wrote: > Hi all, > > > > I am attaching the document that describes how Spot uses LDA in order to > perform anomaly detection on network events. I have also received multiple > questions related to how the ‘user scoring’ (‘feedback’) of particular > items in the suspicious connects report (in the UI layer) is used in ML. We > have not provided much detail on this functionality in the attached > document. I thought I’d put an explanation out there and we can discuss > questions related to my explanation and discuss what additional info should > be included in the attached document. > > > > The Spot team feels that changes are needed to this ‘feedback’ > functionality, and see these changes as happening concurrent with > improvements to the ability for context from an LDA model trained on a > given batch of data to be carried forward to the next training run (or even > training in a streaming use case). The value of ‘feedback’ is dependent on > the quality of the model-context we can carry over. > > > > The idea for feedback is as follows. The items that are scored with a 1 > (i.e. the user identifies the item as benign and so does not want to see it > in the suspicious connects report anymore) will be used for letting the > machine learning component know that such an entry should not be considered > as suspicious anymore. Currently this is done by injecting artificial log > entries into the next batch of data so that LDA sees many such entries and > therefore no longer sees them as anomalies. > > > > We have ideas for other ways to allow this functionality - for example we > could filter entries matching the identified pattern from the next batch > run BEFORE ML runs on the batch. For items that are scored by the user in > the UI as ‘3’ (for example the user sees an ip as so suspicious that we > want to see all future log entries associated to that ip) we could filter > future items matching such a pattern in order to skip ML and instead report > them in a separate pane of the UI or insert them to the top of the most > suspicious events. > > > > Comments, Questions? > > Brandon >
