Hi,

My Name is Rifat. I am a Software Engineer at ESPN/Disney. I have been using 
Nifi for almost one year now and We have a 10 Node Nifi Cluster setup in our 
production environment. As per the best practices document:

https://community.cloudera.com/t5/Community-Articles/NiFi-Sizing-Guide-Deployment-Best-Practices/ta-p/246781

I would want to have 5 separate Repos for Content Repo, 5 for Provenance Repo 
and 1 for Flowfile Repo per node. I need your expert advice on whether using 
EBS or EFS is the best approach to achieve this goal. I already tried EFS and I 
saw some problems with Load Balancing since I mounted an EFS Volume per 
partition(that's 11 EFS partitions per node). This is from the Response I got 
from AWS after raising a support ticket with them.


Hello Rifat,

Thank you for contacting AWS Premium Support.

I am not familiar with Apache NiFi, as this is a third-party software not 
covered by our support policy [1]. That been said, I had a look at its 
documentation [2] and some related links, and it doesn't seem to me that these 
repositories are meant to be on shared storage accessible by all nodes. If you 
look at the NiFi Architecture section, it's suggested that this data is stored 
locally on each node, and the guidelines in the link you provided us with are 
aligned with that principle. I did find some connection between NiFi and Hadoop 
Distributed File System (HDFS), which has some fundamental differences to EFS, 
however it doesn't seem to have any relation with these repositories.

While EFS itself provides strong durability and availability guarantees, the 
NFS protocol is meant to provide weaker cache coherence among its clients as a 
trade-off for higher performance. Characteristics such as Attribute Caching, 
Directory Entry Caching, Asynchronous writes, and the differences in how file 
timestamps are maintained lead to discrepancies in how each node sees data, 
potentially impacting clustered applications expecting strong consistency. 
You'll find a good write-up on that in the Linux NFS documentation [3], section 
"Data and Metadata Coherence".

To see if one of these characteristics are causing the issue, I advise you to 
append 'sync' and 'noac' as mount options for all EFS resources in all nodes; 
the first one will cause all write I/O to become synchronous, and the second 
one will disable Attribute and Directory Entry caching. If that helps resolve 
the issues you are seeing, we'll know that NiFi is expecting strong cache 
coherence. However, you'll need to evaluate if the performance penalty of 
mounting with these options is bearable. It may be possible that EBS or even 
Instance Store are better options to host these repositories, provided that you 
understand the differences in performance and durability between the two.

On a side note, you are missing a few of the recommended mount options for EFS. 
Although I don't expect them to cause an immediate impact for the issue 
described in this support case, it's a good idea to implement them to avoid 
other issues. Please check here [4] for details.

Regarding your question on how to enable communication between directories that 
are mounted on a different EFS, this whole idea of inter-EFS communication does 
not apply. EFS is a file system, and there's no exchange of data between 
separate EFS resources; the only "communication" in that sense would be moving 
data from one EFS to another, which can be done within an instance having both 
file systems mounted. I believe that at this stage, testing the solution with 
the proposed mount options above is a good course of action to isolate the 
problem.

With regard to your comment on logging into these machines and reading the 
contents of /var/log/nifi, please note that Support personnel is not allowed 
under any circumstances to access customer's instances. At this stage, I 
believe that these logs are not required for this case.

To summarise, my first advice is that you seek advice from NiFi experts on 
whether using a distributed file system such as EFS to host cluster node's 
repositories is a valid approach. If using Cloudera Flow Management, you should 
be able to receive support from Cloudera, otherwise the NiFi Community is an 
option [6]. The second advice is to test EFS mounted with 'sync' and 'noac' to 
see if it helps resolve the issue; if the performance penalty is unbearable, 
consider switching repositories to EBS or Instance Store volumes.

If you have questions on the above, please let me know.  


Please let me know the Best Approach to take to solve this problem.


Best Regards, Rifat

  

Reply via email to