Hive 3.0 works well with block stores.  You can either add it to your Metron 
cluster or spin up an ephemeral cluster with Cloudbreak:

1. Metron streams into HDFS in JSON.
2. Compact daily with Spark into ORC format and store in block store (S3, ADLS, 
etc).
3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using LLAP.
4. If querying externally from block store is too slow, try adding more LLAP 
cache or load data into HDFS prior to analysis.

If you are using the Metron Alerts UI, you will need solr which works well only 
on fast disk.   To keep costs down, reduce the context stored in Solr using the 
following techniques:
1. Only index the fields you might search on.
2. Reduce the formats you store in Solr to only those you will want to see in 
the Alerts UI.
3. Reduce the length of time you store data in Solr.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 10/19/18, 7:18 AM, "deepak kumar" <kdq...@gmail.com> wrote:

>Hi All
>I have a quick question around HCP deployments in cloud infra such as AWS.
>I am planning to run persistent cluster for all event streaming and
>processing.
>And then run transient cluster such as AWS EMR to run batch loads on the
>data ingested from persistent cluster.
>Have anyone tried this model ?
>Since data volume is going to be humongous ,cloud is charging lot of money
>for data io and storage.
>Keeping this in mind , what could be the best cloud deployment of hcp
>components assuming there is going to be ingest rate of 10TB per day .
>
>Thanks in advance.
>
>
>Regards,
>Deepak

Reply via email to