Hi Michael,

you can use Apache Atlas as provenance sink.
There is a bridge for Atlas mentioned on Hortonworks and also a current Task 
(https://issues.apache.org/jira/browse/NIFI-3709).

Best regards
Uwe

> Am 23.06.2017 um 18:58 schrieb Knapp, Michael <[email protected]>:
> 
> Hi,
> 
> My team is starting to do more and more with NiFi, and I had several 
> questions for you.
> 
> First, we are thinking of having multiple separate NiFi flows but we want a 
> single source for data provenance.  In the source code I only see these 
> implementations: PersistentProvenanceRepository, 
> VolatileProvenanceRepository, and MockProvenanceRepository.  I was hoping to 
> find a web service that I could run separately from NiFi, and have all my 
> NiFi clusters publish events to that.  Is there any public implementation 
> like that?
> 
> Also, we are thinking seriously about using repositories that are not backed 
> by the local file system.  I am helping an intern write an implementation of 
> ContentRepository that is backed by S3, he has already had some success with 
> this (we started by copying a lot from the VolatileContentRepository).  I’m 
> also interested in implementations backed by Kafka and Pachyderm.  If that 
> works, we will probably also need the other repositories to follow, 
> specifically the FlowFileRepository.  Unfortunately, I cannot find a lot of 
> documentation on how to write these repositories, I have just been figuring 
> things out by reviewing the source code and unit tests, but it is still very 
> confusing to me.  So I was wondering:
> 
> 1.       Has anybody been working on alternative ContentRepository 
> implementations?  Specifically with S3, pachyderm, kafka, or some 
> databases/datastores?
> 
> 2.       Is there any thorough documentation regarding the contracts that 
> these implementations must adhere to? (besides source code and unit tests)
> 
> I’m mainly interested in alternative repositories so I can make NiFi truly 
> fault tolerant (one node dies, and the others immediately take over its 
> work).  Also it would greatly simplify a lot of infrastructure/configuration 
> management for us, could help us save some money, and might help us with 
> compliance issues.  On the down side, it might hurt the file throughput.
> 
> Please let me know,
> 
> Michael Knapp
> 
> ________________________________________________________
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.

Reply via email to