[
https://issues.apache.org/jira/browse/NIFI-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065432#comment-15065432
]
Matt Gilman edited comment on NIFI-826 at 12/19/15 5:06 PM:
------------------------------------------------------------
Sorry, it had been awhile since I've had this on my mind. In order to ensure
templates are exported in a deterministic way, we need to do the 3 bullets
identified above. Pruning excess details from the templates (the last bullet)
is straightforward. That leaves ensuring component IDs are the same and the
components are in a consistent order in the template. Since component IDs are
generated when they are added to a flow, the component IDs are only present in
templates to associate the source and destination of connections.
My basic approach was to sort the components to ensure they were always ordered
the same regardless of the NiFi instance. More specifically, I tried to
accomplish this without introducing any new concepts into the flow.xml.
The sorting strategy could consider any number of configuration details. First
we'd try the name. If the component didn't have a name we could fall back to
the position. Unfortunately, this started to break down and had the side effect
that you mentioned about moving a component affecting its position in the
template. This would hold true for any configuration details.
Your correct that just introducing a new ID (template ID), we would have
collisions. Part of that idea that I forgot to mention was also adding a
timestamp that represents when the component was first added to a flow. The
template ID and timestamp travels with the component in the template and is
used when the component imported. If the component being imported doesn't have
a template ID or timestamp (templates from earlier versions) or if a there is a
template ID collision we would generate a new timestamp.
With this approach, ordering could be based solely on the timestamp. Since the
timestamp travels with the component, we can ensure consistent ordering
regardless when/where the component was first added to a flow. Any new
components, newly added or reintroduced via subsequent copy/paste or import,
would get a new current timestamp. Because of this, they would always end up at
the end of the listing. With this ordering ensured, we should be able to
generate IDs using a simple one up number.
was (Author: mcgilman):
Sorry, it had been awhile since I've had this on my mind. In order to ensure
templates are exported in a deterministic way, we need to do the 3 bullets
identified above. Pruning excess details from the templates (the last bullet)
is straightforward. That leaves ensuring component IDs are the same and the
components are in a consistent order in the template. Since component IDs are
generated when they are added to a flow, the component IDs are only present in
templates to associate the source and destination of connections.
My basic approach was to sort the components to ensure they were always ordered
the same regardless of the NiFi instance. More specifically, I tried to
accomplish this without introducing any new concepts into the flow.xml.
The sorting strategy could consider any number of configuration details. First
we'd try the name. If the component didn't have a name we could fall back to
the position. Unfortunately, this started to break down and had the side effect
that you mentioned about moving a component affecting its position in the
template. This would hold true for any configuration details.
Your correct that just introducing a new ID (template ID), we would have
collisions. Part of that idea that I forgot to mention was also adding a
timestamp that represents when the component was first added to a flow. The
template ID and timestamp travels with the component in the template and is
used when the component imported. If the component being imported doesn't have
a template ID or timestamp (templates from earlier versions) or if a there is a
template ID collision we would generate a new timestamp.
With this approach, ordering could be based solely on the timestamp. Since the
timestamp travels with the component, we can ensure consistent ordering
regardless when/where the component was first added to a flow. Any new
components, newly added or reintroduced via subsequent copy/paste or import,
would get a new current timestamp. Because of this they would always would end
up at the end of the listing. With this ordered ensured, we should be able to
generate IDs using a simple one up number.
> Export templates in a deterministic way
> ---------------------------------------
>
> Key: NIFI-826
> URL: https://issues.apache.org/jira/browse/NIFI-826
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Matt Gilman
> Assignee: Matt Gilman
>
> Templates should be exported in a deterministic way so that they can be
> compared or diff'ed with another. Items to consider...
> - The ordering of components
> - The id's used to identify the components
> - Consider excluding irrelevant items. When components are imported some
> settings are ignored (run state).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)