[ 
https://issues.apache.org/jira/browse/NIFI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118181#comment-15118181
 ] 

Aldrin Piri commented on NIFI-259:
----------------------------------

[~markap14]

Performed an initial codebase review and that of functionality when starting up 
an instance.  Below are my comments thus far.  Build, contrib, and the like all 
seemed good.  Many of the comments are around the administrator doc for things 
that seemed a little unclear.  I am planning on taking some additional time to 
put things through their paces in a clustered environment and will follow up 
with any findings there, but wanted to get the initial dialog going on the 
findings thus far.

===

General Comments:

The merges from master made it hard to track the work done explicitly on this 
branch and extract out only the efforts related to 259.  Apologies if some of 
the comments are slightly off base, did my best to work with an associated diff 
and some local rebasing.

There are a number of classes that seemingly have only shuffled import order 
(e.g. - ClusteredEventAccess.java, cluster/manager/NodeResponse.java, and 
HttpRequestReplicatorImpl.java among others) and/or mode/permissions change 
(FileBasedClusterNodeFirewallTest.java
index b5f76fb..55c8768 100644).  Not sure if they were from this branch or 
merged in, but make the commits a bit noisy

=== Code/Doc Review/Commentary

StateManager
Scope not being relevant as mentioned in the comments, seems like it could lead 
to imprecise operations in some scenarios.  Perhaps we should have a means of 
configuring tolerance to disconnections when an instance believes it should be 
configured.

ProcessContext
#getStateManager()

Does it make sense to support multiple state managers? Could see this 
potentially having similar semantics as that of the ControllerServices in terms 
of providing lookups.  This comment is certainly not specific to this 
particular component, but made me consider how it might get used.  In 
hindsight, I believe this may be an untenable mess with multiple, but will 
leave the comment for consideration.  Not sure I understand the mapping between 
this and StateManagerProvider#getStateManager

NiFiProperties

STATE_MANAGEMENT_MAX_ZOOKEEPER_SERVERS constant is commented out under the 
guise of being a method comment on #getLocalStateProvider

Administration guide
State Providers - I see there are one or more local providers and zero or more 
cluster providers.  This mapping between these providers and the associated 
manager is a bit confusing to me.

ZooKeeper "Connect String":  provide a sample 
(zk1.myzookeeper.com;zk2.myzookeeper;...)

When talking about the 'Root Node' provide an indication of how NiFi manages 
this space and what happens should it not be available and/or prepopulated

Minor grammar/style:  "If `CreatorOnly` is specified, then only user that is 
allowed to read, change, delete, or administer the data is the user that was 
used".  more appropriately:  "then only the user that created the data is 
allowed to read, change..."

The logic behind standalone mode and its handling of cluster-state-provider is 
involved.  This is a note to myself to review how this looks in context of the 
application and the associated errors.  Might like to see an error if the 
cluster-state-provider is configured.


[[embedded_zookeeper]] section

Grammar/Style:  "What this means is that NiFi has a dependencies on ZooKeeper 
in order to" -> "What this means is that NiFi has dependencies on ZooKeeper..."

Typo:  "At a minimum, This properties file needs to be populated" -> "At a 
minimum, this properties file needs to be populated"

May want to highlight that the specified hostname needs to be reachable from 
the associated peers

[[zk_kerberos_client]] section
Not for the documentation but for the related config, consider providing a 
sample zookeeper-jaas.conf out of the box as well as the associated, commented 
out line in the bootstrap.conf

[[zk_ssl_client/server]] sections

currently blank, and it is my understanding that this is not yet supported with 
the available ZK releases.  need to remove references to this section and SSL 
above and either remove this section or make commentary that it is not 
currently possible (preferred, as the questions are likely to arise)

[[state_scope]] section
Typo:  "If state as stored using..." -> "If state is stored"

"Store and Retrieving State"

"That is, if two Processors store a value using the key _My Key_, those 
Processors
+will not conflict with each other, even if both Processors are of the same 
type (e.g., both are of type ListFile). Furthermore,"

Need to make more explicit that these are instances within the same flow (and 
clear up that in a cluster, they are one instance), was a bit confusing until 
reading through and establishing context


nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/pom.xml
 had its header clobbered by auto format, may want to enable the setting to 
only format changed lines in versioned files

Stateful Annotation
style:  Scope[]scopes -> Scope[] scopes

> Framework should offer Processors/extensions a way to manage simple state
> -------------------------------------------------------------------------
>
>                 Key: NIFI-259
>                 URL: https://issues.apache.org/jira/browse/NIFI-259
>             Project: Apache NiFi
>          Issue Type: Sub-task
>          Components: Core Framework
>            Reporter: Joseph Witt
>            Assignee: Mark Payne
>             Fix For: 0.5.0
>
>
> It is not uncommon for processors to need managed state which  persists 
> across restarts.  One good example of this is the GetHTTP processor which 
> needs to save state of the last cache/e-tag information it received from the 
> server it interacts with so that it can avoid constantly pulling an unchanged 
> resource.
> Rather than making processors roll their own persistant state management 
> perhaps we can offer something to them via the ProcessContext which will 
> allow them to save simple primitive values and limited length Strings which 
> are local to that node and which are local to that processor only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to