[
https://issues.apache.org/jira/browse/SOLR-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242927#comment-16242927
]
Hoss Man commented on SOLR-9120:
--------------------------------
I'm very confused by some of the back and forth comments here -- beyond the
fact that in some cases people seem to be talking about *different* bugs
(SOLR-11616) that are completely unrelated to the LukeRequestHandler problem
here (similar only because they both involve NoSuchFileException on segments
files, even though the code is completley different)
in particular...
bq. It may be a good idea to actually figure out what is the root cause of this
behavior. Seems quite weird to me. There may be some fundamental issue at the
lucene level ?
...i don't understand this confusion about the "root cause" of where/why the
NoSuchFileException exception comes from ... that was very clearely spelled out
in the discussion in SOLR-8793, which lead to the creation of SOLR-8587 -- both
of which are linked to this jira...
From: SOLR-8793
bq. Lucene deletes the files no longer referenced by an index (through any
commit) so they disappear on the directory listing. IndexReaders still having
them open on older index state are not affected ("delete on last close" POSIX
semantics) - the inode is still existing, just the directory entry is gone. The
inode is deleted if nothing refers to it anymore. This is different on Windows
under some circumstances (MMapDirectory).
bq. OK yea you're right, I was confused. The file can be read by the open IR,
but won't appear in directory listing. I opened SOLR-8793 to fix this, sorry
for that!
At which point, in SOLR-8587, [~shaie] fixed the code so that instead of
propgating the NoSuchFileException up out and failing the LukeRequestHandler
request, the code would simply _warn_ about the Exception, and the request
would succeed, ala this code...
{code}
private static long getFileLength(Directory dir, String filename) {
try {
return dir.fileLength(filename);
} catch (IOException e) {
// Whatever the error is, only log it and return -1.
log.warn("Error getting file length for [{}]", filename, e);
return -1;
}
}
{code}
...which is why in comments above, people report seeing exceptions like this in
their logs (_including the "Error getting file length" info from
getFileLength_)...
{noformat}
2017-02-16 15:56:00.122 WARN (qtp895947612-44262) [ x:inclive]
o.a.s.h.a.LukeRequestHandler Error getting file length
for [segments_cr3]
java.nio.file.NoSuchFileException:
/index/solr6/data/data/inc_1/index/segments_cr3
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:598)
{noformat}
...but the {{/luke}} requests themselves don't actually fail.
----
As things stand now, the code is doing exactly what it was designed to do:
* make an attempt at reporting the _on disk_ size of the segments file
currently in use by the reader
* if the size of the segments file can't be determined due to any IO
Exception, return -1 log a WARN about the IO Exception
The question becomes: do we want to _change_ that logging?
LukeRequestHandler is currently making no assumptions about wether or not any
IO Exception it encounters is "bad" or not -- from the perspective of that
class the sitation is non fatal so it's not logging as an ERROR, just a WARNing
... we could remove this logging, or change it to INFO or DEBUG -- but then
LukeRequestHandler would be making some explicit assumptions that this type of
exception is "OK" ... and in many cases it is, but what if in some cases it
isn't?
I suspect this is why Shai used WARN in the first place -- to avoid making any
explicit assumptions about how sever the IOException is -- we could easily
change this to be less noisy, but should we?
My vote would be "yes": log it only at a DEBUG level and update the message to
something like "Ignoring exception attempting to check (optional) fileSize stat
for: <filename>"
----
As for the attached patch: I don't understand the nuances of why getting the
last commit from the DeletionPolicy ma/may-not be better then getting it from
the IndexReader, but it doesn't change the fundemental root of the issue: a
commit point may be refering to a (segments) file that is still *usable* by
already open file handles, but no longer has an entry in the directory listing
so it does not *exist* if you try to ask the Directory for it's size. changing
which commit we ask doesn't change the underlying filesystem semantics.
in other words:
* the patch as is can never fx the underlying source of the NoSuchFileException
* if the patch is a good idea for other reasons, thne that should be spelled
out in a distinct issue explaining why exactly LukeRequestHandler should _in
general, for all aspects of index info) be using
{{core.getDeletionPolicy().getLatestCommit()}} instead of
{{reader.getIndexCommit()}}
> o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_NNN] --
> NoSuchFileException
> ------------------------------------------------------------------------------------------------
>
> Key: SOLR-9120
> URL: https://issues.apache.org/jira/browse/SOLR-9120
> Project: Solr
> Issue Type: Bug
> Affects Versions: 6.0
> Reporter: Markus Jelsma
> Attachments: SOLR-9120.patch, SOLR-9120.patch
>
>
> On Solr 6.0, we frequently see the following errors popping up:
> {code}
> java.nio.file.NoSuchFileException:
> /var/lib/solr/logs_shard2_replica1/data/index/segments_2c5
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.Files.size(Files.java:2332)
> at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
> at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:131)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:597)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:585)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:137)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2033)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]