Hi Michael, you can use Apache Atlas as provenance sink. There is a bridge for Atlas mentioned on Hortonworks and also a current Task (https://issues.apache.org/jira/browse/NIFI-3709).
Best regards Uwe > Am 23.06.2017 um 18:58 schrieb Knapp, Michael <[email protected]>: > > Hi, > > My team is starting to do more and more with NiFi, and I had several > questions for you. > > First, we are thinking of having multiple separate NiFi flows but we want a > single source for data provenance. In the source code I only see these > implementations: PersistentProvenanceRepository, > VolatileProvenanceRepository, and MockProvenanceRepository. I was hoping to > find a web service that I could run separately from NiFi, and have all my > NiFi clusters publish events to that. Is there any public implementation > like that? > > Also, we are thinking seriously about using repositories that are not backed > by the local file system. I am helping an intern write an implementation of > ContentRepository that is backed by S3, he has already had some success with > this (we started by copying a lot from the VolatileContentRepository). I’m > also interested in implementations backed by Kafka and Pachyderm. If that > works, we will probably also need the other repositories to follow, > specifically the FlowFileRepository. Unfortunately, I cannot find a lot of > documentation on how to write these repositories, I have just been figuring > things out by reviewing the source code and unit tests, but it is still very > confusing to me. So I was wondering: > > 1. Has anybody been working on alternative ContentRepository > implementations? Specifically with S3, pachyderm, kafka, or some > databases/datastores? > > 2. Is there any thorough documentation regarding the contracts that > these implementations must adhere to? (besides source code and unit tests) > > I’m mainly interested in alternative repositories so I can make NiFi truly > fault tolerant (one node dies, and the others immediately take over its > work). Also it would greatly simplify a lot of infrastructure/configuration > management for us, could help us save some money, and might help us with > compliance issues. On the down side, it might hurt the file throughput. > > Please let me know, > > Michael Knapp > > ________________________________________________________ > > The information contained in this e-mail is confidential and/or proprietary > to Capital One and/or its affiliates and may only be used solely in > performance of work or services for Capital One. The information transmitted > herewith is intended only for use by the individual or entity to which it is > addressed. If the reader of this message is not the intended recipient, you > are hereby notified that any review, retransmission, dissemination, > distribution, copying or other use of, or taking of any action in reliance > upon this information is strictly prohibited. If you have received this > communication in error, please contact the sender and delete the material > from your computer.
