Re: HDFS Compression

Matt Foley Tue, 11 Oct 2016 13:31:08 -0700

Some of the things that are desirable to do with stored data (including those 
mentioned by others below):
- Use it to train ML models
o This implies that the format of records stored in HDFS and the format of 
records streamed to a “Threat Intel” topology should be readily transformable 
into each other via simple filters – preferably very simple.
- Reprocess as time series data
- Aggregation, Summarization
- Graphs, Pivot charts
- Ad-hoc queries via Hive and Spark, about almost any aspect of the data
- Investigation / discovery with Zeppelin, Tableau, or similar tools
- CEP analysis (not necessarily all in ES)
- Future integration with other data in a Data Lake


--Matt


On 10/11/16, 10:20 AM, "Otto Fowler" <[email protected]> wrote:

    And also support the extensibility offered by STELLAR and enrichments, such
    that adding new fields using either will not mean having to write
    supporting java code etc.
    
    Or from a higher level : The flexibility for configuration based enrichment
    and modification of the data through ingest should not be lost for storage
    requirements.
    
    On October 11, 2016 at 13:13:43, Carolyn Duby ([email protected]) wrote:
    
    The format should be compatible/optimal with spark and Zeppelin. Perhaps
    other interactive BI tools like Tableau.
    
    Thanks
    Carolyn
    
    
    
    
    On 10/11/16, 1:06 PM, "Nick Allen" <[email protected]> wrote:
    
    >Right. The original idea is to do batch analytics. Kind of difficult to
    >work with data sitting in an ES index. But if we get a better
    understanding
    >of the type of batch analytics, it might get us closer to the target.
    >
    >On Tue, Oct 11, 2016 at 1:03 PM, [email protected] <[email protected]>
    wrote:
    >
    >> I'm somewhat ignorant here, never having used the MaaS stuff yet, but
    isn't
    >> that the dataset that the models would run against? I understand there
    >> could be additional use cases, I just wanted to be clear.
    >>
    >> Jon
    >>
    >> On Tue, Oct 11, 2016 at 1:01 PM Nick Allen <[email protected]> wrote:
    >>
    >> > I don't think we put much thought into how exactly the data should be
    >> > landed in HDFS and for what use cases. It just has not been a
    priority.
    >> >
    >> > That being said, this might be a good time to gather everyone's
    thoughts
    >> on
    >> > how they would use that kind of data and for what purposes.
    >> >
    >> >
    >> >
    >> > On Tue, Oct 11, 2016 at 12:11 PM, Owen O'Malley <[email protected]>
    >> > wrote:
    >> >
    >> > > Be careful of using compressed JSON, since it isn't splittable. JSON
    is
    >> > > also very slow for reading.
    >> > >
    >> > > .. Owen
    >> > >
    >> > > On Tue, Oct 11, 2016 at 4:31 AM, Casey Stella <[email protected]>
    >> > wrote:
    >> > >
    >> > > > I'd also tack on to this that the configuration for the hdfs
    writer
    >> > > should
    >> > > > be moved to zookeeper rather than done in flux, IMO
    >> > > > On Tue, Oct 11, 2016 at 07:20 Otto Fowler <[email protected]>
    
    >> > > wrote:
    >> > > >
    >> > > > > The storage format and retrieval from that format should be
    >> > > configurable,
    >> > > > > that is a ‘boundary’ for Metron so to speak.
    >> > > > >
    >> > > > > On October 10, 2016 at 16:15:12, [email protected] (
    >> [email protected])
    >> > > > > wrote:
    >> > > > >
    >> > > > > Is there a specific reason why the JSON files stored in HDFS are
    >> not
    >> > > > > compressed? I looked for some related JIRAs and mail
    conversations
    >> > but
    >> > > > > couldn't find this already mentioned. I'm wondering if there was
    a
    >> > good
    >> > > > > enough of an argument to keep things uncompressed, or if the
    >> subject
    >> > > just
    >> > > > > hadn't been broached yet.
    >> > > > >
    >> > > > > Jon
    >> > > > > --
    >> > > > >
    >> > > > > Jon
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> >
    >> >
    >> > --
    >> > Nick Allen <[email protected]>
    >> >
    >> --
    >>
    >> Jon
    >>
    >
    >
    >
    >--
    >Nick Allen <[email protected]>

Re: HDFS Compression

Reply via email to