Hive 3.0 works well with block stores. You can either add it to your Metron cluster or spin up an ephemeral cluster with Cloudbreak:
1. Metron streams into HDFS in JSON. 2. Compact daily with Spark into ORC format and store in block store (S3, ADLS, etc). 3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using LLAP. 4. If querying externally from block store is too slow, try adding more LLAP cache or load data into HDFS prior to analysis. If you are using the Metron Alerts UI, you will need solr which works well only on fast disk. To keep costs down, reduce the context stored in Solr using the following techniques: 1. Only index the fields you might search on. 2. Reduce the formats you store in Solr to only those you will want to see in the Alerts UI. 3. Reduce the length of time you store data in Solr. Thanks Carolyn Duby Solutions Engineer, Northeast cd...@hortonworks.com +1.508.965.0584 Join my team! Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html> On 10/19/18, 7:18 AM, "deepak kumar" <kdq...@gmail.com> wrote: >Hi All >I have a quick question around HCP deployments in cloud infra such as AWS. >I am planning to run persistent cluster for all event streaming and >processing. >And then run transient cluster such as AWS EMR to run batch loads on the >data ingested from persistent cluster. >Have anyone tried this model ? >Since data volume is going to be humongous ,cloud is charging lot of money >for data io and storage. >Keeping this in mind , what could be the best cloud deployment of hcp >components assuming there is going to be ingest rate of 10TB per day . > >Thanks in advance. > > >Regards, >Deepak