RE: Long-term storage for enriched data

Carolyn Duby Wed, 21 Dec 2016 16:27:07 -0800

Hi Dima -

What type of analytics are you looking to do?  Is the normalized format not 
working?  You could use an oozie or spark job to create derivative tables.


Nifi may be overkill for breaking up the kafka stream.  Spark streaming may be 
easier.

Thanks
Carolyn



Sent from my Verizon, Samsung Galaxy smartphone


-------- Original message --------
From: Dima Kovalyov <[email protected]>
Date: 12/21/16 6:28 PM (GMT-05:00)
To: [email protected]
Subject: Long-term storage for enriched data

Hello,

Currently we are researching fast and resources efficient way to save
enriched data in Hive for further Analytics.

There are two scenarios that we consider:
a) Use Ozzie Java job that uses Metron enrichment classes to "manually"
enrich each line of the source data that is picked up from the source
dir (the one that we have developed already and using). That is
something that we developed on our own. Downside: custom code that built
on top of Metron source code.

b) Use NiFi to listen for indexing Kafka topic -> split stream by source
type -> Put every source type in corresponding Hive table.

I wonder, if someone was going any of this direction and if there are
best practices for this? Please advise.
Thank you.

- Dima

RE: Long-term storage for enriched data

Reply via email to