Guten Tag Ioi Lam,
am Montag, 18. November 2019 um 22:21 schrieben Sie:

> https://bugs.openjdk.java.net/browse/JDK-8234363

Thanks for doing that!

> I have not investigated the issue in detail yet. How often do you see 
> ERROR_NO_MORE_FILES happening?

It's difficult to say currently because my customer doesn't monitor
such things. So I don't know when it starts to happen and if so, if it
happens always really. From my tests, it seems to start at some point
and happens occasionally afterwards, maybe even getting more. But it
still doesn't happen always, during my tests there where some times at
which copying the files succeeded.

I'm trying to get some more logs in the meantime to find a pattern.

> Have you checked if your process
> (apache?) has too many open files such that FindFirstFileW is not able
> to open the directory to get a file listing?

With Apache I meant the Java-lib providing some I/O-helpers[1], not
the web server or stuff. My daemon is a plain Java-process started
manually on the shell and I'm somewhat sure that I don't have a
handle-leak in that process because of the following reasons:

At the point where I copy things and ERROR_NOR_MORE_FILES happen, I
don't have any open files myself anymore and looking at SCM-logs, code
didn't change. Commons-IO hasn't been updated in years as well, so is
unlikely to newly introduce leaks as well.

Besides that, what Process Monitor[2] logs during success/failure
looks exactly the same in case of error vs. success, the only
difference being ERROR_NO_SUCH_FILE vs. ERROR_NO_MORE_FILES. If there
would be some handle leak in the process resulting in the IOException,
keeping the process running would fail in former file-related
operations already, where I really read and create files on my own.
But that's not the case, all those operations always succeed, only
when it's about copying the created files into their target directory
things start failing at some point, but even then still succeed
sometimes.

But the most important thing in my opinion is that the error is
persistent during restarts of my daemon, which should clear all open
handles in theory. When the problem happens often, restarting my
daemon doesn't seem to change anything. What instantly solves the
problem is clearing the target directory of the copy operation,
either by renaming the old one and creating a new one or by simply
deleting what is present in that directory currently. ONLY if I do one
of those things the copy operations start to succeed reliably again,
regardless of if the daemon is restartedt or kept running even after
failing before.

I don't care about formerly available contents in the target directory
myself, but am using files with timestamps and stuff like that. And
that's my point: While there surely is some problem somewhere, I think
it's most likely to be in the infrastructure of my customer, because
he has storage-related problems anyway. Things are too slow sometimes
and all that. While I don't see anything of those problems in ProcMon,
like timeouts, permissions problems or other real errors, when my
problem occurs, it might simply be that Windows internally behaves
undocumented for some currently unknown reason.

By allowing ERROR_NO_MORE_FILES it might be that whatever the problem
in Windows might be simply gets ignored up until a point where a real
problem happens. And if that doesn't occur in the end, one doesn't
need to care as well. Allowing ERROR_NO_MORE_FILES doesn't look that
different to e.g. ERROR_NETWORK_UNREACHABLE to me, because in my setup
the latter would be the even bigger problem, as I'm copying things on
network shares in the end.

> If that is indeed the case, I am not sure what's the best way of
> handling it. If resource (file descriptors) are running out, perhaps the
> current behavior of throwing an exception in 
> WinNTFileSystem.canonicalize0() would be better than just ignoring it 
> and return an incorrect result. But I'll defer to the folks on the 
> core-libs team.

Please notice that Windows has ERROR_TOO_MANY_OPEN_FILES for that, so
in my opinion this is another strong hint that ERROR_NO_MORE_FILES
really is some kind of success. Only undocumented/unepxected, but that
seems to be the case with many of the other error codes handled in
"lastErrorReportable" as well.

[1]: https://commons.apache.org/proper/commons-io/
[2]: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Reply via email to