Completely agree. We recently incorporated this mark down document to spot-ml 
folder: 
https://github.com/apache/incubator-spot/blob/master/spot-ml/SUSPICIOUS_CONNECTS_SCHEMA.md.
 But we can always improve. 

Going back to the main issue, if people things it’s ok I’ll create an issue for:

- Spot-ml check schema for Flow, DNS and Proxy input data
- Make more consistent the documentation about required schema for spot-ml when 
not using spot-ingest


On 6/22/17, 1:37 PM, "Michael Ridley" <[email protected]> wrote:

    I agree, having a data model defined and documented would help a lot in
    separating processing from a specific ingest flow.
    
    Michael
    
    On Thu, Jun 22, 2017 at 1:31 PM, Jonathan Natkins <[email protected]>
    wrote:
    
    > Personally, I'd love for there to be more information about the expected
    > schema for the ML jobs, as well as information about where the data can be
    > picked up from. The documentation seems to be mostly written with a
    > specific example in mind, so is not extremely helpful when trying to
    > integrate new data sources. A data dictionary would help with being able 
to
    > map fields from data formats (other logs, etc) to fields that spot-ml can
    > process.
    >
    > Whatever happened to the open data model that was being discussed for 
Spot?
    >
    > Thanks!
    > Natty
    >
    > On Thu, Jun 22, 2017 at 10:10 AM Barona, Ricardo <[email protected]
    > >
    > wrote:
    >
    > > Hi everyone.
    > >
    > > I’m happy to see how more people is playing with Spot and particularly
    > > with spot-ml everytime.
    > >
    > > Something that I’ve noticed thanks to these two Jira issues (
    > > https://issues.apache.org/jira/browse/SPOT-149 and
    > > https://issues.apache.org/jira/browse/SPOT-174) is that sometimes users
    > > are going to want to try spot-ml without ingesting data using 
spot-ingest
    > > and I think that’s cool but seems like that can lead to inconsistent
    > schema
    > > issues.
    > >
    > > I’d like to know what you think, what would be the best approach to deal
    > > with this; I’m thinking that we can add schema validation to spot-ml
    > before
    > > anything else happens but I don’t know if that’s going to lock things 
too
    > > much.
    > >
    > > Please share your thoughts.
    > >
    > > Thanks,
    > > Ricardo Barona
    > >
    > --
    > Jonathan "Natty" Natkins
    > StreamSets | Field Engineering Director
    > mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice>
    >
    
    
    
    -- 
    Michael Ridley <[email protected]>
    office: (650) 352-1337
    mobile: (571) 438-2420
    Senior Solutions Architect
    Cloudera, Inc.
    

Reply via email to