JoeS I think you are seeing a queue bug that has been corrected or reported on the 1.x line.
As for the frankencluster concept i think it is generally fair game. There are a number of design reasons, most notably back pressure, that make this approach feasible. So the big ticket items to consider are things like CPU Since the model of NiFi is that basically all processes/tasks are eligible to run on all nodes and that when configuring the number of threads and tasks per controller and component that they are applied to all nodes this could be problematic when there is a substantive imbalance of power on the various systems. If this were important to improve we could allow node-local overrides of max controller threads. That helps a bit but doesn't really solve it. Again back pressure is probably the most effective. There are probably a number of things we could do here if needed. Disk We have to consider the speed, congestion, and storage available on the disk(s) and how they're partitioned and such for our various repositories. Again back pressure is one of the more effective mechanisms here because it is all about doing as much as you can which means other nodes should be able to take on more/less. Fortunately the configuration of the repositories and such here are node-local so we can have pretty considerable variety here and things work pretty well. Network Back pressure for the win. Though significant imbalances could lead to significant congestion which could cause inefficiencies in general so would need to be careful. That scenario would require wildly imbalanced node capabilities and very high rate flows most likely. Memory JVM Heap size variability and/or off heap memory differences could cause some nodes to behave wildly different than others in ways that back pressure will not necessarily solve. For instance a node with too low heap size for the types of processes in the flow could yield order(s) of magnitude lower performance than another node. We should do more for these things. Users should not have to configure things like swapping thresholds for instance. We should at runtime determine and tune those values. It is simply too hard to find a good magic number that predicts the likely number of flow file attributes and size that might be needed and those can have a substantial impact on heap usage. Right now we treat swapping on a per queue basis though it is configured globally. If you have say just 100 queues each holding in memory 1000 flowfiles you have all the attributes of those 100,000 flowfiles in memory. If each flow file took up just 1KB of memory we're talking 100+MB. Perhaps a slightly odd example but users aren't going to go through and think about every queue and the optimal global swapping setting. Though it is an important number. The system should be watching them all and doing this automatically. That could help quite a lot. We may also end up needing to not even have flowfile attributes held in memory though supporting this would require API changes to ensure they're only accessed in stream friendly ways. Doing this for all uses of EL is probably pretty straightforward but all the direct attribute map accesses would need consideration. ...And we also need to think through things like OS Differences in accessing resources We generally follow "Pure Java (tm)" practices where possible. So this helps a lot. But still things like accessing specific file paths as might be needed in flow configurations themselves (GetFile/PutFile for example) could be tricky (but doable). The protocols used to source data matter a lot With all this talk of back pressure keep in mind that how data gets into NiFi becomes really critical in these clusters. If you use protocols which do not afford fault tolerance and load balancing then things are not great. So protocols which have queuing semantics or feedback mechanisms or let NiFi as the consumer control things will work out well. Some portions of JMS are good for this. Kafka is good for this. NiFi's own site-to-site is good for this. The frankencluster testing is a valuable way to force and think through interesting issues. Maybe the frankencluster as you have it isn't realistic but it still exposes the concepts that need to be thought through for cases that definitely are. Thanks Joe On Tue, Sep 27, 2016 at 7:37 AM, Joe Skora <jsk...@gmail.com> wrote: > The images just show what the text described, 13 files queued, EmptyQueue > returns 0 of 13 removed, and ListQueue returns the queue has no flowfiles. > > There were 13 files of 1k sitting in a queue between a SegmentContent and > ControlRate. After I sent that email I had to stop/start the processors a > couple of times for other things and somewhere in the midst of that the > queue cleared. > > > > On Mon, Sep 26, 2016 at 11:05 PM, Peter Wicks (pwicks) <pwi...@micron.com> > wrote: > >> Joe, >> >> I didn’t get the images (might just be my exchange server). How many files >> are in the queue? (exact count please) >> >> --Peter >> >> From: Joe Skora [mailto:jsk...@gmail.com] >> Sent: Monday, September 26, 2016 8:20 PM >> To: dev@nifi.apache.org >> Subject: Questions about heterogeneous cluster and queue >> problem/bug/oddity in 1.0.0 >> >> I have a 3 node test franken-cluster that I'm abusing for the sake of >> learning. The systems run Ubuntu 15.04, OS X 10.11.6, and Windows 10 and >> though far comparable each has quad-core i7 between 2.5 and 3.5 GHz and >> 16GB of RAM. Two have SSDs and the third has a 7200RPM SATA III drive. >> >> 1) Is there any reason mixing operating systems with the cluster would be >> a bad idea. Once configured it seems to run ok. >> 2) Will performance disparities affect reliable ability or performance >> within the cluster? >> 3) Are there ways to configure disparate systems such that they can all >> perform at peak? >> >> The bug or issues I have run into is a queue showing files that can't be >> remove or listed. Screen shots attached below. I don't know if it's a >> mixed-OS issues, something I did while torturing the systems (all stayed >> up, this time), or just a weird anomaly. >> >> Regards, >> Joe >> >> Trying to empty queue seen in background >> [Inline image 1] >> >> but the flowfiles cannot be deleted. >> [Inline image 2] >> >> But try to list them and it says there are no files in the queue? >> [Inline image 3] >>