Personally, I'd love for there to be more information about the expected schema for the ML jobs, as well as information about where the data can be picked up from. The documentation seems to be mostly written with a specific example in mind, so is not extremely helpful when trying to integrate new data sources. A data dictionary would help with being able to map fields from data formats (other logs, etc) to fields that spot-ml can process.
Whatever happened to the open data model that was being discussed for Spot? Thanks! Natty On Thu, Jun 22, 2017 at 10:10 AM Barona, Ricardo <[email protected]> wrote: > Hi everyone. > > I’m happy to see how more people is playing with Spot and particularly > with spot-ml everytime. > > Something that I’ve noticed thanks to these two Jira issues ( > https://issues.apache.org/jira/browse/SPOT-149 and > https://issues.apache.org/jira/browse/SPOT-174) is that sometimes users > are going to want to try spot-ml without ingesting data using spot-ingest > and I think that’s cool but seems like that can lead to inconsistent schema > issues. > > I’d like to know what you think, what would be the best approach to deal > with this; I’m thinking that we can add schema validation to spot-ml before > anything else happens but I don’t know if that’s going to lock things too > much. > > Please share your thoughts. > > Thanks, > Ricardo Barona > -- Jonathan "Natty" Natkins StreamSets | Field Engineering Director mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice>
