Thank you for reply Carolyn,

Currently for the test purposes we enrich flow with Geo and ThreatIntel
malware IP, but plan to expand this further.

Our dev team is working on Oozie job to process this. So meanwhile I
wonder if I could use NiFi for this purpose (because we already using it
for data ingest and stream).

Could you elaborate why it may be overkill? The idea is to have
everything in one place instead of hacking into Metron libraries and code.

- Dima

On 12/22/2016 02:26 AM, Carolyn Duby wrote:
> Hi Dima -
>
> What type of analytics are you looking to do?  Is the normalized format not 
> working?  You could use an oozie or spark job to create derivative tables.
>
> Nifi may be overkill for breaking up the kafka stream.  Spark streaming may 
> be easier.
>
> Thanks
> Carolyn
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>
> -------- Original message --------
> From: Dima Kovalyov <[email protected]>
> Date: 12/21/16 6:28 PM (GMT-05:00)
> To: [email protected]
> Subject: Long-term storage for enriched data
>
> Hello,
>
> Currently we are researching fast and resources efficient way to save
> enriched data in Hive for further Analytics.
>
> There are two scenarios that we consider:
> a) Use Ozzie Java job that uses Metron enrichment classes to "manually"
> enrich each line of the source data that is picked up from the source
> dir (the one that we have developed already and using). That is
> something that we developed on our own. Downside: custom code that built
> on top of Metron source code.
>
> b) Use NiFi to listen for indexing Kafka topic -> split stream by source
> type -> Put every source type in corresponding Hive table.
>
> I wonder, if someone was going any of this direction and if there are
> best practices for this? Please advise.
> Thank you.
>
> - Dima
>
>

Reply via email to