Rick, There have been a few requests for a first-class state management feature, and it is definitely on the community's radar.
Right now, a good example of the current approach would probably be the ListHDFS processor. It uses a combination of a local state file and the DistributedMapCache controller service. In a cluster, ListHDFS would be scheduled to run only on the primary node, so by utilizing the the DistributedMapCache it allows all nodes in a cluster to know where to pick up in the event that the primary node of the cluster is changed. There are a few other processors that also use the local state file approach, I believe GetHttp and GetSolr are two of them. -Bryan On Tue, Sep 1, 2015 at 11:59 AM, Rick Braddy <[email protected]> wrote: > Hi, > > I have a Nifi design question. In order to process extremely large files > (any size), we intend to create a processor that reads the file in "chunks" > and sends as a multi-part FlowFile series, which will avoid using up all > available content repository and/or JVM space. > > One way would be to create our own state file that contains the latest job > information (per thread/job), but that seems very clunky. > > The question is, with long-running processes like this that need to be > restartable (without starting from the beginning on big files), are there > any standard Nifi design patterns we should consider? > > Thanks in advance. > Rick >
