[
https://issues.apache.org/jira/browse/NIFI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053638#comment-17053638
]
Joe Witt edited comment on NIFI-7222 at 3/6/20, 5:53 PM:
---------------------------------------------------------
[~hdo] This issue is definitely annoying but it by itself probably doesn't
warrant the amount of community effort necessary to conduct a release process.
For now please feel free to use the snapshot nar I gave you for testing. If we
do end up needing a 1.11.4 I'll plan to grab it otherwise it will end up in
1.12.
was (Author: joewitt):
[~hdo] This issue is definitely annoying but it by itself probably doesn't
warrant the amount of community effort to fix. For now please feel free to use
the snapshot nar I gave you for testing. If we do end up needing a 1.11.4 I'll
plan to grab it otherwise it will end up in 1.12.
> FetchSFTP appears to not advise the remote system it is done with a given
> resource resulting in too many open files
> -------------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-7222
> URL: https://issues.apache.org/jira/browse/NIFI-7222
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Joe Witt
> Assignee: Joe Witt
> Priority: Major
> Fix For: 1.12.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hi guys,
>
> We have an issue with the FetchSFTP processor and the max open file
> descriptors. In short, it seems that the FetchSFTP keeps the file open
> “forever” on our Synology NAS, so we are reaching always the default max open
> files limit of 1024 from our Synlogy NAS if we try to fetch 500’000 small 1MB
> files (so in fact it’s not possible to read the files as everything is
> blocked after 1024 files).
>
> We found no option to rise the limit of max open files on the Synology NAS
> (but that’s not NiFi’s fault 😉). We have also other linux machine with
> CentOS, but the behavior there isn’t exactly always the same. Sometimes the
> file descriptors get closed but sometimes as well not.
>
> Synology has no lsof command, but this is how I’ve checked it:
> user@nas-01:~$ sudo ls -l /proc/<SSHD SFTP process PID>/fd | wc -l
> 1024
>
> Any comments how we can troubleshoot the issue?
>
> Cheers Josef
> Oh sorry, missed one of of the most important parts, we are using a 8-node
> cluster with nifi 1.11.3 – so perfectly up to date.
>
> Cheers Josef
> Hi Joe
>
> Ok, to our setup, we just bought a new powerful Synology NAS to use it as
> SFTP server mainly for NiFi to replace our current SFTP linux machine. So the
> NAS is empty and just configured for this single use case (read/write SFTP
> from NiFi). Nothing else is running there at the moment. Important limit is
> per SSH/user session ulimit -a 1024 open files max.:
>
> root@nas-01:~# ulimit -a
> core file size (blocks, -c) unlimited
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 62025
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 62025
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
>
> On NiFi side we are using an 8 node cluster, but it doesn’t matter whether
> I’m using the whole cluster or just one single (primary) node. It’s clearly
> visible that it’s related to the number of FetchSFTP processors running. So
> if I’m distributing the load to 8 nodes I’m seeing 8 SFTP sessions on the NAS
> and we can fetch 8x1024 files. I’m also seeing the file descriptors from
> each file (per FetchSFTP processor = PID) on the NAS which has been fetched
> by NiFi. In my understanding this files should be fetched and the file
> descriptor should be closed after the transfer, but this doesn’t seems to be
> the case in most of the times.
>
> As soon as I’m stopping the “FetchSFTP” processor, the SFTP session seems to
> be closed and all FDs are gone. So after stop/start I can fetch again 1024
> files.
>
> So I tried to troubleshoot a bit further and here is what I’ve done in NiFi
> and on the NAS:
>
> A screenshot of text
> Description automatically generated
>
> So I’ve done a ListSFTP and got 2880 flowfiles, they will be loadbalanced to
> one single node (to simplify to test and only get 1 SFTP session on the NAS).
> In the ControlRate I’m transferring every 10 seconds 10 flowfiles to the
> FetchSFTP, that corelates directly with the open file descriptors on my NAS,
> as you can see below. Sometimes, and I don’t know when or why, the SFTP
> session will be closed and everything starts from scratch (not happened here)
> without any notice on NiFi side. As you see, the FDs are growing with +10
> every 10sec and if I’m checking the path/filename of the open FDs I see that
> this are the one which I’ve fetched.
>
> root@nas-01:~# ps aux | grep sftp
> root 1740 0.5 0.0 240848 8584 ? Ss 15:01 0:00 sshd:
> ldr@internal-sftp
> root 1753 0.0 0.0 23144 2360 pts/2 S+ 15:01 0:00 grep
> --color=auto sftp
> root 15520 0.0 0.0 241088 9252 ? Ss 13:38 0:02 sshd:
> ldr@internal-sftp
> root@nas-01:~#
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 24
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 34
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 44
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 54
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 64
>
> root@p-li-nas-01:~# ls -l /proc/1740/fd | head
> total 0
> lr-x------ 1 root root 64 Mar 4 15:01 0 -> pipe:[1086218]
> l-wx------ 1 root root 64 Mar 4 15:01 1 -> pipe:[1086219]
> lr-x------+ 1 root root 64 Mar 4 15:01 10 ->
> /volume1/test/2019-08-31/detail-20190831-0104-92.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 100 ->
> /volume1/test/2019-08-31/detail-20190831-0052-91.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 101 ->
> /volume1/test/2019-08-31/detail-20190831-0340-92.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 102 ->
> /volume1/test/2019-08-31/detail-20190831-0246-91.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 103 ->
> /volume1/test/2019-08-31/detail-20190831-0104-91.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 104 ->
> /volume1/test/2019-08-31/detail-20190831-0150-91.log.gz
> lr-x------+ 1 root root 64 Mar 4 15:03 105 ->
> /volume1/test/2019-08-31/detail-20190831-0013-91.log.gz
> …
>
> So to sum up, one FetchSFTP generates one SFTP Session on the NAS. The SFTP
> Session holds FDs which ,most of the time, doesn’t get closed. Reproduceable
> with the template above with FetchSFTP to a CentOS machine or a Synology NAS.
> Main question is now, why were the FDs not closed or when should the SFTP
> session gets closed.
>
> Thanks
> Just checked the open sockets on the NiFi machine where FetchSFTP is running,
> of course there is just one SSH session if I’m using just one single
> processor… So the SFTP transfer is hidden in the SSH session.
>
> Open TCP sessions on NiFi
> [user@ nifi-05 ~]$ netstat -vatn | grep x.y.z.232
> tcp 0 0 x.y.z.144:33628 x.y.z.232:22 ESTABLISHED
>
> Any comments are welcome. Still unclear where the open FDs on the NAS (SFTP
> server) are coming from or how it should work from NiFi perspective.
>
> Cheers Josef
--
This message was sent by Atlassian Jira
(v8.3.4#803005)