Hi all, I am attaching the document that describes how Spot uses LDA in order to perform anomaly detection on network events. I have also received multiple questions related to how the ‘user scoring’ (‘feedback’) of particular items in the suspicious connects report (in the UI layer) is used in ML. We have not provided much detail on this functionality in the attached document. I thought I’d put an explanation out there and we can discuss questions related to my explanation and discuss what additional info should be included in the attached document.
The Spot team feels that changes are needed to this ‘feedback’ functionality, and see these changes as happening concurrent with improvements to the ability for context from an LDA model trained on a given batch of data to be carried forward to the next training run (or even training in a streaming use case). The value of ‘feedback’ is dependent on the quality of the model-context we can carry over. The idea for feedback is as follows. The items that are scored with a 1 (i.e. the user identifies the item as benign and so does not want to see it in the suspicious connects report anymore) will be used for letting the machine learning component know that such an entry should not be considered as suspicious anymore. Currently this is done by injecting artificial log entries into the next batch of data so that LDA sees many such entries and therefore no longer sees them as anomalies. We have ideas for other ways to allow this functionality - for example we could filter entries matching the identified pattern from the next batch run BEFORE ML runs on the batch. For items that are scored by the user in the UI as ‘3’ (for example the user sees an ip as so suspicious that we want to see all future log entries associated to that ip) we could filter future items matching such a pattern in order to skip ML and instead report them in a separate pane of the UI or insert them to the top of the most suspicious events. Comments, Questions? Brandon
