Depending on the model of security, you may have some challenges with the
Ranger integration with your cloud storage especially if you are thinking
of using TDE for the encryption at rest. Otherwise, using Metron in that
way should be quite feasible. However, you may face some performance issues
depending on what the required SLA is, but the cost saving will most
probably convince you to go with the decoupling storage from compute.

Cheers,
Ali

On Tue, Oct 23, 2018 at 2:57 AM deepak kumar <kdq...@gmail.com> wrote:

> Thanks Carolyn.
> Is there any defined reference architecture to refer to?
>
> Thanks
> Deepak
>
> On Mon, Oct 22, 2018 at 8:23 PM Carolyn Duby <cd...@hortonworks.com>
> wrote:
>
> >
> > Hive 3.0 works well with block stores.  You can either add it to your
> > Metron cluster or spin up an ephemeral cluster with Cloudbreak:
> >
> > 1. Metron streams into HDFS in JSON.
> > 2. Compact daily with Spark into ORC format and store in block store (S3,
> > ADLS, etc).
> > 3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using
> > LLAP.
> > 4. If querying externally from block store is too slow, try adding more
> > LLAP cache or load data into HDFS prior to analysis.
> >
> > If you are using the Metron Alerts UI, you will need solr which works
> well
> > only on fast disk.   To keep costs down, reduce the context stored in
> Solr
> > using the following techniques:
> > 1. Only index the fields you might search on.
> > 2. Reduce the formats you store in Solr to only those you will want to
> see
> > in the Alerts UI.
> > 3. Reduce the length of time you store data in Solr.
> >
> > Thanks
> > Carolyn Duby
> > Solutions Engineer, Northeast
> > cd...@hortonworks.com
> > +1.508.965.0584
> >
> > Join my team!
> > Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> > Solutions Engineer – Boston - http://grnh.se/8gbxy41
> > Need Answers? Try https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> >
> >
> >
> >
> >
> >
> >
> >
> > On 10/19/18, 7:18 AM, "deepak kumar" <kdq...@gmail.com> wrote:
> >
> > >Hi All
> > >I have a quick question around HCP deployments in cloud infra such as
> AWS.
> > >I am planning to run persistent cluster for all event streaming and
> > >processing.
> > >And then run transient cluster such as AWS EMR to run batch loads on the
> > >data ingested from persistent cluster.
> > >Have anyone tried this model ?
> > >Since data volume is going to be humongous ,cloud is charging lot of
> money
> > >for data io and storage.
> > >Keeping this in mind , what could be the best cloud deployment of hcp
> > >components assuming there is going to be ingest rate of 10TB per day .
> > >
> > >Thanks in advance.
> > >
> > >
> > >Regards,
> > >Deepak
> >
>


-- 
A.Nazemian

Reply via email to