[ 
https://issues.apache.org/jira/browse/NIFIREG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456525#comment-16456525
 ] 

ASF GitHub Bot commented on NIFIREG-162:
----------------------------------------

Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi-registry/pull/112#discussion_r184706432
  
    --- Diff: nifi-registry-docs/src/main/asciidoc/administration-guide.adoc ---
    @@ -895,3 +895,167 @@ Providing 2 total locations, including 
`nifi.registry.extension.dir.1`.
       Example: `/etc/http-nifi-registry.keytab`
     |nifi.registry.kerberos.spengo.authentication.expiration|The expiration 
duration of a successful Kerberos user authentication, if used. The default 
value is `12 hours`.
     |====
    +
    +== Persistence Providers
    +
    +NiFi Registry uses a pluggable flow persistence provider to store the 
content of the flows saved to the registry. NiFi Registry provides 
`<<FileSystemFlowPersistenceProvider>>` and `<<GitFlowPersistenceProvider>>`.
    +
    +Each persistence provider has its own configuration parameters, those can 
be configured in a XML file specified in <<Providers 
Properties,nifi-registry.properties>>.
    +
    +The XML configuration file looks like below. It has a 
`flowPersistenceProvider` element in which qualified class name of a 
persistence provider implementation and its configuration properties are 
defined. See following sections for available configurations for each providers.
    +
    +.Example providers.xml
    +[source,xml]
    +....
    +<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    +<providers>
    +
    +    <flowPersistenceProvider>
    +        <class>persistence-provider-qualified-class-name</class>
    +        <property name="property-1">property-value-1</property>
    +        <property name="property-2">property-value-2</property>
    +        <property name="property-n">property-value-n</property>
    +    </flowPersistenceProvider>
    +
    +</providers>
    +....
    +
    +
    +=== FileSystemFlowPersistenceProvider
    +
    +FileSystemFlowPersistenceProvider simply stores serialized Flow contents 
into `{bucket-id}/{flow-id}/{version}` directories.
    +
    +Example of persisted files:
    +....
    +Flow Storage Directory/
    +├── {bucket-id}/
    +│   └── {flow-id}/
    +│       ├── {version}/{version}.snapshot
    +└── d1beba88-32e9-45d1-bfe9-057cc41f7ce8/
    +    └── 219cf539-427f-43be-9294-0644fb07ca63/
    +        ├── 1/1.snapshot
    +        └── 2/2.snapshot
    +....
    +
    +Qualified class name: 
`org.apache.nifi.registry.provider.flow.FileSystemFlowPersistenceProvider`
    +
    +|====
    +|*Property*|*Description*
    +|Flow Storage Directory|REQUIRED: File system path for a directory where 
flow contents files are persisted to. If the directory does not exist when NiFi 
Registry starts, it will be created. If the directory exists, it must be 
readable and writable from NiFi Registry.
    +|====
    +
    +
    +=== GitFlowPersistenceProvider
    +
    +GitFlowPersistenceProvider stores flow contents under a Git directory.
    +
    +In contrast to FileSystemFlowPersistenceProvider, this provider uses human 
friendly Bucket and Flow names so that those files can be accessed by external 
tools. However, it is NOT supported to modify stored files outside of NiFi 
Registry. Persisted files are only read when NiFi Registry starts up.
    +
    +Buckets are represented as directories and Flow contents are stored as 
files in a Bucket directory they belong to. Flow snapshot histories are managed 
as Git commits, meaning only the latest version of Buckets and Flows exist in 
the Git directory. Old versions are retrieved from Git commit histories.
    +
    +.Example persisted files
    +....
    +Flow Storage Directory/
    +├── .git/
    +├── Bucket A/
    +│   ├── bucket.yml
    +│   ├── Flow 1.snapshot
    +│   └── Flow 2.snapshot
    +└── Bucket B/
    +    ├── bucket.yml
    +    └── Flow 4.snapshot
    +....
    +
    +Each Bucket directory contains a YAML file named `bucket.yml`. The file 
manages links from NiFi Registry Bucket and Flow IDs to actual directory and 
file names. When NiFi Registry starts, this provider reads through Git commit 
histories and lookup these `bucket.yml` files to restore Buckets and Flows for 
each snapshot version.
    +
    +.Example bucket.yml
    +[source,yml]
    +....
    +layoutVer: 1
    +bucketId: d1beba88-32e9-45d1-bfe9-057cc41f7ce8
    +flows:
    +  219cf539-427f-43be-9294-0644fb07ca63: {ver: 7, file: Flow 1.snapshot}
    +  22cccb6c-3011-4493-a996-611f8f112969: {ver: 3, file: Flow 2.snapshot}
    +....
    +
    +Qualified class name: 
`org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider`
    +
    +|====
    +|*Property*|*Description*
    +|Flow Storage Directory|REQUIRED: File system path for a directory where 
flow contents files are persisted to. The directory must exist when NiFi 
registry starts. Also must be initialized as a Git directory. See <<Initialize 
Git directory>> for detail.
    +|Remote To Push|When a new flow snapshot is created, this persistence 
provider updated files in the specified Git directory, then create a commit to 
the local repository. If `Remote To Push` is defined, it also pushes to the 
specified remote repository. E.g. 'origin'. To define more detailed remote spec 
such as branch names, use `Refspec`. See 
https://git-scm.com/book/en/v2/Git-Internals-The-Refspec
    +|Remote Access User|This user name is used to make push requests to the 
remote repository when `Remote To Push` is enabled, and the remote repository 
is accessed by HTTP protocol. If SSH is used, user authentication is done with 
SSH keys.
    +|Remote Access Password|Used with `Remote Access User`.
    +|====
    +
    +==== Initialize Git directory
    +
    +In order to use GitFlowPersistenceRepository, you need to prepare a Git 
directory on the local file system. You can do so by initializing a directory 
with `git init` command, or clone an existing Git project from a remote Git 
repository by `git clone` command.
    +
    +- Git init command
    +https://git-scm.com/docs/git-init
    +- Git clone command
    +https://git-scm.com/docs/git-clone
    +
    +
    +==== Git user configuration
    +
    +Git distinguishes a user by its username and email address. This 
persistence provider uses NiFi Registry username when it creates Git commits. 
However since NiFi Registry users do not provide email address, preconfigured 
Git user email address is used.
    +
    +You can configure Git user name and email address by `git config` command.
    +
    +- Git config command
    +https://git-scm.com/docs/git-config
    +
    +
    +==== Git user authentication
    +
    +By default, this persistence repository only create commits to local 
repository. No user authentication is needed to do so. However, if 'Commit To 
Push' is enabled, user authentication to the remote Git repository is required.
    +
    +If the remote repository is accessed by HTTP, then username and password 
for authentication can be configured in the providers XML configuration file.
    +
    +When SSH is used, SSH keys are used to identify a Git user. In order to 
pick the right key to a remote server, the SSH configuration file 
`${USER_HOME}/.ssh/config` is used. The SSH configuration file can contain 
multiple `Host` entries to specify a key file to login to a remote Git server. 
The `Host` must much with the target remote Git server hostname.
    +
    +.example SSH config file
    +....
    +Host git.example.com
    +  HostName git.example.com
    +  IdentityFile ~/.ssh/id_rsa
    +
    +Host github.com
    +  HostName github.com
    +  IdentityFile ~/.ssh/key-for-github
    +
    +Host bitbucket.org
    +  HostName bitbucket.org
    +  IdentityFile ~/.ssh/key-for-bitbucket
    +....
    +
    +=== Data model version of serialized Flow snapshots
    +
    +Serialized Flow snapshots saved by these persistence providers have 
versions, so that the data format and schema can evolve over time. Data model 
version update is done automatically by NiFi Registry when it reads and stores 
each Flow content.
    +
    +Here is the data model version histories:
    +
    +|====
    +|*Data model version*|*Since NiFi Registry*|*Description*
    +|2|0.2|JSON formatted text file. The root object contains header and Flow 
content object.
    +|1|0.1|Binary format having header bytes at the beginning followed by Flow 
content represented as XML.
    +|====
    +
    +=== Migrating stored files between different Persistence Provider
    --- End diff --
    
    Yea I agree that in the future we could probably offer a bulk import/export 
via the CLI which only needed to use the REST API. We already have a JIRA for 
some import/export end-points 
(https://issues.apache.org/jira/browse/NIFIREG-148), so it might just be some 
CLI commands to make use of those once they exist.
    
    For this first release since we only have two providers, and since registry 
is very new, I think it is acceptable to say that if you want to use the git 
provider then you can start over with it. 


> Add Git backed persistence provider
> -----------------------------------
>
>                 Key: NIFIREG-162
>                 URL: https://issues.apache.org/jira/browse/NIFIREG-162
>             Project: NiFi Registry
>          Issue Type: Improvement
>            Reporter: Koji Kawamura
>            Assignee: Koji Kawamura
>            Priority: Major
>
> Currently, NiFi Registry provides FileSystemFlowPersistenceProvider, which 
> stores Flow snapshot files into local file system. It simply manages snapshot 
> versions by creating directories with version numbers.
> While it works, there are also demands for using Git as a version control and 
> persistence mechanism.
> A Git backend persistence repository would be beneficial in following aspects:
> * Git is a SCM (Source Control Management) that manages commits, branches, 
> file diffs, patches natively and provide ways to contribute and apply changes 
> among users
> * Local and remote Git repositories can construct a distributed reliable 
> storage
> * There are several Git repository services on the internet which can be used 
> as remote Git repositories those can be used as backup storages
> There are few things with current NiFi Registry framework and existing 
> FileSystemFlowPersistenceProvider those may not be Git friendly:
> * Bucket id and Flow id are UUID and not recognizable by human, if those 
> files have human readable names, many Git commands and tools can be used 
> easier.
> * Current serialized Flow snapshots are binary files having header bytes and 
> XML encoded flow contents. If those are pure ASCII format, Git can provide 
> better diffs among commits, that can provide better UX in terms of 
> controlling Flow snapshot versions
> * NiFi Registry userid which can be used as author in Git commit is not 
> available in FlowSnapshotContext
> Also, if we are going to add another Persistence Provider implementation, we 
> also need to provide a way to migrate existing persisted files so that those 
> can be used by new one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to