On the scoring piece. 1 has traditionally been "Bad" and 3 has been "Benign". Are we changing that?
Alan On Thu, May 25, 2017 at 10:49 AM, Alan Ross <[email protected]> wrote: > I don't believe this list permits attachments Brandon. Perhaps post it to > google docs and send out a link? > > Alan > > On Thu, May 25, 2017 at 10:27 AM, Edwards, Brandon < > [email protected]> wrote: > >> Hi all, >> >> >> >> I am attaching the document that describes how Spot uses LDA in order to >> perform anomaly detection on network events. I have also received multiple >> questions related to how the ‘user scoring’ (‘feedback’) of particular >> items in the suspicious connects report (in the UI layer) is used in ML. We >> have not provided much detail on this functionality in the attached >> document. I thought I’d put an explanation out there and we can discuss >> questions related to my explanation and discuss what additional info should >> be included in the attached document. >> >> >> >> The Spot team feels that changes are needed to this ‘feedback’ >> functionality, and see these changes as happening concurrent with >> improvements to the ability for context from an LDA model trained on a >> given batch of data to be carried forward to the next training run (or even >> training in a streaming use case). The value of ‘feedback’ is dependent on >> the quality of the model-context we can carry over. >> >> >> >> The idea for feedback is as follows. The items that are scored with a 1 >> (i.e. the user identifies the item as benign and so does not want to see it >> in the suspicious connects report anymore) will be used for letting the >> machine learning component know that such an entry should not be considered >> as suspicious anymore. Currently this is done by injecting artificial log >> entries into the next batch of data so that LDA sees many such entries and >> therefore no longer sees them as anomalies. >> >> >> >> We have ideas for other ways to allow this functionality - for example we >> could filter entries matching the identified pattern from the next batch >> run BEFORE ML runs on the batch. For items that are scored by the user in >> the UI as ‘3’ (for example the user sees an ip as so suspicious that we >> want to see all future log entries associated to that ip) we could filter >> future items matching such a pattern in order to skip ML and instead report >> them in a separate pane of the UI or insert them to the top of the most >> suspicious events. >> >> >> >> Comments, Questions? >> >> Brandon >> > >
