Re: HDFS Compression

Kyle Richardson Fri, 04 Nov 2016 18:42:07 -0700

Possibly naive question... Has there been past discussion on the use of
avro for the data in HDFS?


-Kyle

On Tue, Oct 11, 2016 at 4:30 PM, Matt Foley <[email protected]> wrote:

> Some of the things that are desirable to do with stored data (including
> those mentioned by others below):
> - Use it to train ML models
> o This implies that the format of records stored in HDFS and the format of
> records streamed to a “Threat Intel” topology should be readily
> transformable into each other via simple filters – preferably very simple.
> - Reprocess as time series data
> - Aggregation, Summarization
> - Graphs, Pivot charts
> - Ad-hoc queries via Hive and Spark, about almost any aspect of the data
> - Investigation / discovery with Zeppelin, Tableau, or similar tools
> - CEP analysis (not necessarily all in ES)
> - Future integration with other data in a Data Lake
>
> --Matt
>
>
> On 10/11/16, 10:20 AM, "Otto Fowler" <[email protected]> wrote:
>
>     And also support the extensibility offered by STELLAR and enrichments,
> such
>     that adding new fields using either will not mean having to write
>     supporting java code etc.
>
>     Or from a higher level : The flexibility for configuration based
> enrichment
>     and modification of the data through ingest should not be lost for
> storage
>     requirements.
>
>     On October 11, 2016 at 13:13:43, Carolyn Duby ([email protected])
> wrote:
>
>     The format should be compatible/optimal with spark and Zeppelin.
> Perhaps
>     other interactive BI tools like Tableau.
>
>     Thanks
>     Carolyn
>
>
>
>
>     On 10/11/16, 1:06 PM, "Nick Allen" <[email protected]> wrote:
>
>     >Right. The original idea is to do batch analytics. Kind of difficult
> to
>     >work with data sitting in an ES index. But if we get a better
>     understanding
>     >of the type of batch analytics, it might get us closer to the target.
>     >
>     >On Tue, Oct 11, 2016 at 1:03 PM, [email protected] <[email protected]>
>     wrote:
>     >
>     >> I'm somewhat ignorant here, never having used the MaaS stuff yet,
> but
>     isn't
>     >> that the dataset that the models would run against? I understand
> there
>     >> could be additional use cases, I just wanted to be clear.
>     >>
>     >> Jon
>     >>
>     >> On Tue, Oct 11, 2016 at 1:01 PM Nick Allen <[email protected]>
> wrote:
>     >>
>     >> > I don't think we put much thought into how exactly the data
> should be
>     >> > landed in HDFS and for what use cases. It just has not been a
>     priority.
>     >> >
>     >> > That being said, this might be a good time to gather everyone's
>     thoughts
>     >> on
>     >> > how they would use that kind of data and for what purposes.
>     >> >
>     >> >
>     >> >
>     >> > On Tue, Oct 11, 2016 at 12:11 PM, Owen O'Malley <
> [email protected]>
>     >> > wrote:
>     >> >
>     >> > > Be careful of using compressed JSON, since it isn't splittable.
> JSON
>     is
>     >> > > also very slow for reading.
>     >> > >
>     >> > > .. Owen
>     >> > >
>     >> > > On Tue, Oct 11, 2016 at 4:31 AM, Casey Stella <
> [email protected]>
>     >> > wrote:
>     >> > >
>     >> > > > I'd also tack on to this that the configuration for the hdfs
>     writer
>     >> > > should
>     >> > > > be moved to zookeeper rather than done in flux, IMO
>     >> > > > On Tue, Oct 11, 2016 at 07:20 Otto Fowler <
> [email protected]>
>
>     >> > > wrote:
>     >> > > >
>     >> > > > > The storage format and retrieval from that format should be
>     >> > > configurable,
>     >> > > > > that is a ‘boundary’ for Metron so to speak.
>     >> > > > >
>     >> > > > > On October 10, 2016 at 16:15:12, [email protected] (
>     >> [email protected])
>     >> > > > > wrote:
>     >> > > > >
>     >> > > > > Is there a specific reason why the JSON files stored in
> HDFS are
>     >> not
>     >> > > > > compressed? I looked for some related JIRAs and mail
>     conversations
>     >> > but
>     >> > > > > couldn't find this already mentioned. I'm wondering if
> there was
>     a
>     >> > good
>     >> > > > > enough of an argument to keep things uncompressed, or if the
>     >> subject
>     >> > > just
>     >> > > > > hadn't been broached yet.
>     >> > > > >
>     >> > > > > Jon
>     >> > > > > --
>     >> > > > >
>     >> > > > > Jon
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> >
>     >> >
>     >> > --
>     >> > Nick Allen <[email protected]>
>     >> >
>     >> --
>     >>
>     >> Jon
>     >>
>     >
>     >
>     >
>     >--
>     >Nick Allen <[email protected]>
>
>
>
>

Re: HDFS Compression

Reply via email to