[ 
https://issues.apache.org/jira/browse/NIFI-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065432#comment-15065432
 ] 

Matt Gilman edited comment on NIFI-826 at 12/19/15 5:06 PM:
------------------------------------------------------------

Sorry, it had been awhile since I've had this on my mind. In order to ensure 
templates are exported in a deterministic way, we need to do the 3 bullets 
identified above. Pruning excess details from the templates (the last bullet) 
is straightforward. That leaves ensuring component IDs are the same and the 
components are in a consistent order in the template. Since component IDs are 
generated when they are added to a flow, the component IDs are only present in 
templates to associate the source and destination of connections.

My basic approach was to sort the components to ensure they were always ordered 
the same regardless of the NiFi instance. More specifically, I tried to 
accomplish this without introducing any new concepts into the flow.xml.

The sorting strategy could consider any number of configuration details. First 
we'd try the name. If the component didn't have a name we could fall back to 
the position. Unfortunately, this started to break down and had the side effect 
that you mentioned about moving a component affecting its position in the 
template. This would hold true for any configuration details.

Your correct that just introducing a new ID (template ID), we would have 
collisions. Part of that idea that I forgot to mention was also adding a 
timestamp that represents when the component was first added to a flow. The 
template ID and timestamp travels with the component in the template and is 
used when the component imported. If the component being imported doesn't have 
a template ID or timestamp (templates from earlier versions) or if a there is a 
template ID collision we would generate a new timestamp.

With this approach, ordering could be based solely on the timestamp. Since the 
timestamp travels with the component, we can ensure consistent ordering 
regardless when/where the component was first added to a flow. Any new 
components, newly added or reintroduced via subsequent copy/paste or import, 
would get a new current timestamp. Because of this, they would always end up at 
the end of the listing. With this ordering ensured, we should be able to 
generate IDs using a simple one up number. 


was (Author: mcgilman):
Sorry, it had been awhile since I've had this on my mind. In order to ensure 
templates are exported in a deterministic way, we need to do the 3 bullets 
identified above. Pruning excess details from the templates (the last bullet) 
is straightforward. That leaves ensuring component IDs are the same and the 
components are in a consistent order in the template. Since component IDs are 
generated when they are added to a flow, the component IDs are only present in 
templates to associate the source and destination of connections.

My basic approach was to sort the components to ensure they were always ordered 
the same regardless of the NiFi instance. More specifically, I tried to 
accomplish this without introducing any new concepts into the flow.xml.

The sorting strategy could consider any number of configuration details. First 
we'd try the name. If the component didn't have a name we could fall back to 
the position. Unfortunately, this started to break down and had the side effect 
that you mentioned about moving a component affecting its position in the 
template. This would hold true for any configuration details.

Your correct that just introducing a new ID (template ID), we would have 
collisions. Part of that idea that I forgot to mention was also adding a 
timestamp that represents when the component was first added to a flow. The 
template ID and timestamp travels with the component in the template and is 
used when the component imported. If the component being imported doesn't have 
a template ID or timestamp (templates from earlier versions) or if a there is a 
template ID collision we would generate a new timestamp.

With this approach, ordering could be based solely on the timestamp. Since the 
timestamp travels with the component, we can ensure consistent ordering 
regardless when/where the component was first added to a flow. Any new 
components, newly added or reintroduced via subsequent copy/paste or import, 
would get a new current timestamp. Because of this they would always would end 
up at the end of the listing. With this ordered ensured, we should be able to 
generate IDs using a simple one up number. 

> Export templates in a deterministic way
> ---------------------------------------
>
>                 Key: NIFI-826
>                 URL: https://issues.apache.org/jira/browse/NIFI-826
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Matt Gilman
>            Assignee: Matt Gilman
>
> Templates should be exported in a deterministic way so that they can be 
> compared or diff'ed with another. Items to consider...
> - The ordering of components
> - The id's used to identify the components
> - Consider excluding irrelevant items. When components are imported some 
> settings are ignored (run state).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to