And also support the extensibility offered by STELLAR and enrichments, such that adding new fields using either will not mean having to write supporting java code etc.
Or from a higher level : The flexibility for configuration based enrichment and modification of the data through ingest should not be lost for storage requirements. On October 11, 2016 at 13:13:43, Carolyn Duby ([email protected]) wrote: The format should be compatible/optimal with spark and Zeppelin. Perhaps other interactive BI tools like Tableau. Thanks Carolyn On 10/11/16, 1:06 PM, "Nick Allen" <[email protected]> wrote: >Right. The original idea is to do batch analytics. Kind of difficult to >work with data sitting in an ES index. But if we get a better understanding >of the type of batch analytics, it might get us closer to the target. > >On Tue, Oct 11, 2016 at 1:03 PM, [email protected] <[email protected]> wrote: > >> I'm somewhat ignorant here, never having used the MaaS stuff yet, but isn't >> that the dataset that the models would run against? I understand there >> could be additional use cases, I just wanted to be clear. >> >> Jon >> >> On Tue, Oct 11, 2016 at 1:01 PM Nick Allen <[email protected]> wrote: >> >> > I don't think we put much thought into how exactly the data should be >> > landed in HDFS and for what use cases. It just has not been a priority. >> > >> > That being said, this might be a good time to gather everyone's thoughts >> on >> > how they would use that kind of data and for what purposes. >> > >> > >> > >> > On Tue, Oct 11, 2016 at 12:11 PM, Owen O'Malley <[email protected]> >> > wrote: >> > >> > > Be careful of using compressed JSON, since it isn't splittable. JSON is >> > > also very slow for reading. >> > > >> > > .. Owen >> > > >> > > On Tue, Oct 11, 2016 at 4:31 AM, Casey Stella <[email protected]> >> > wrote: >> > > >> > > > I'd also tack on to this that the configuration for the hdfs writer >> > > should >> > > > be moved to zookeeper rather than done in flux, IMO >> > > > On Tue, Oct 11, 2016 at 07:20 Otto Fowler <[email protected]> >> > > wrote: >> > > > >> > > > > The storage format and retrieval from that format should be >> > > configurable, >> > > > > that is a ‘boundary’ for Metron so to speak. >> > > > > >> > > > > On October 10, 2016 at 16:15:12, [email protected] ( >> [email protected]) >> > > > > wrote: >> > > > > >> > > > > Is there a specific reason why the JSON files stored in HDFS are >> not >> > > > > compressed? I looked for some related JIRAs and mail conversations >> > but >> > > > > couldn't find this already mentioned. I'm wondering if there was a >> > good >> > > > > enough of an argument to keep things uncompressed, or if the >> subject >> > > just >> > > > > hadn't been broached yet. >> > > > > >> > > > > Jon >> > > > > -- >> > > > > >> > > > > Jon >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Nick Allen <[email protected]> >> > >> -- >> >> Jon >> > > > >-- >Nick Allen <[email protected]>
