Possibly naive question... Has there been past discussion on the use of avro for the data in HDFS?
-Kyle On Tue, Oct 11, 2016 at 4:30 PM, Matt Foley <[email protected]> wrote: > Some of the things that are desirable to do with stored data (including > those mentioned by others below): > - Use it to train ML models > o This implies that the format of records stored in HDFS and the format of > records streamed to a “Threat Intel” topology should be readily > transformable into each other via simple filters – preferably very simple. > - Reprocess as time series data > - Aggregation, Summarization > - Graphs, Pivot charts > - Ad-hoc queries via Hive and Spark, about almost any aspect of the data > - Investigation / discovery with Zeppelin, Tableau, or similar tools > - CEP analysis (not necessarily all in ES) > - Future integration with other data in a Data Lake > > --Matt > > > On 10/11/16, 10:20 AM, "Otto Fowler" <[email protected]> wrote: > > And also support the extensibility offered by STELLAR and enrichments, > such > that adding new fields using either will not mean having to write > supporting java code etc. > > Or from a higher level : The flexibility for configuration based > enrichment > and modification of the data through ingest should not be lost for > storage > requirements. > > On October 11, 2016 at 13:13:43, Carolyn Duby ([email protected]) > wrote: > > The format should be compatible/optimal with spark and Zeppelin. > Perhaps > other interactive BI tools like Tableau. > > Thanks > Carolyn > > > > > On 10/11/16, 1:06 PM, "Nick Allen" <[email protected]> wrote: > > >Right. The original idea is to do batch analytics. Kind of difficult > to > >work with data sitting in an ES index. But if we get a better > understanding > >of the type of batch analytics, it might get us closer to the target. > > > >On Tue, Oct 11, 2016 at 1:03 PM, [email protected] <[email protected]> > wrote: > > > >> I'm somewhat ignorant here, never having used the MaaS stuff yet, > but > isn't > >> that the dataset that the models would run against? I understand > there > >> could be additional use cases, I just wanted to be clear. > >> > >> Jon > >> > >> On Tue, Oct 11, 2016 at 1:01 PM Nick Allen <[email protected]> > wrote: > >> > >> > I don't think we put much thought into how exactly the data > should be > >> > landed in HDFS and for what use cases. It just has not been a > priority. > >> > > >> > That being said, this might be a good time to gather everyone's > thoughts > >> on > >> > how they would use that kind of data and for what purposes. > >> > > >> > > >> > > >> > On Tue, Oct 11, 2016 at 12:11 PM, Owen O'Malley < > [email protected]> > >> > wrote: > >> > > >> > > Be careful of using compressed JSON, since it isn't splittable. > JSON > is > >> > > also very slow for reading. > >> > > > >> > > .. Owen > >> > > > >> > > On Tue, Oct 11, 2016 at 4:31 AM, Casey Stella < > [email protected]> > >> > wrote: > >> > > > >> > > > I'd also tack on to this that the configuration for the hdfs > writer > >> > > should > >> > > > be moved to zookeeper rather than done in flux, IMO > >> > > > On Tue, Oct 11, 2016 at 07:20 Otto Fowler < > [email protected]> > > >> > > wrote: > >> > > > > >> > > > > The storage format and retrieval from that format should be > >> > > configurable, > >> > > > > that is a ‘boundary’ for Metron so to speak. > >> > > > > > >> > > > > On October 10, 2016 at 16:15:12, [email protected] ( > >> [email protected]) > >> > > > > wrote: > >> > > > > > >> > > > > Is there a specific reason why the JSON files stored in > HDFS are > >> not > >> > > > > compressed? I looked for some related JIRAs and mail > conversations > >> > but > >> > > > > couldn't find this already mentioned. I'm wondering if > there was > a > >> > good > >> > > > > enough of an argument to keep things uncompressed, or if the > >> subject > >> > > just > >> > > > > hadn't been broached yet. > >> > > > > > >> > > > > Jon > >> > > > > -- > >> > > > > > >> > > > > Jon > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Nick Allen <[email protected]> > >> > > >> -- > >> > >> Jon > >> > > > > > > > >-- > >Nick Allen <[email protected]> > > > >
