[ 
https://issues.apache.org/jira/browse/SOLR-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242927#comment-16242927
 ] 

Hoss Man commented on SOLR-9120:
--------------------------------

I'm very confused by some of the back and forth comments here -- beyond the 
fact that in some cases people seem to be talking about *different* bugs 
(SOLR-11616) that are completely unrelated to the LukeRequestHandler problem 
here (similar only because they both involve NoSuchFileException on segments 
files, even though the code is completley different) 

in particular...

bq. It may be a good idea to actually figure out what is the root cause of this 
behavior. Seems quite weird to me. There may be some fundamental issue at the 
lucene level ?

...i don't understand this confusion about the "root cause" of where/why the 
NoSuchFileException exception comes from ... that was very clearely spelled out 
in the discussion in SOLR-8793, which lead to the creation of SOLR-8587 -- both 
of which are linked to this jira...

From: SOLR-8793
bq. Lucene deletes the files no longer referenced by an index (through any 
commit) so they disappear on the directory listing. IndexReaders still having 
them open on older index state are not affected ("delete on last close" POSIX 
semantics) - the inode is still existing, just the directory entry is gone. The 
inode is deleted if nothing refers to it anymore. This is different on Windows 
under some circumstances (MMapDirectory).
bq. OK yea you're right, I was confused. The file can be read by the open IR, 
but won't appear in directory listing. I opened SOLR-8793 to fix this, sorry 
for that!

At which point, in SOLR-8587, [~shaie] fixed the code so that instead of 
propgating the NoSuchFileException up out and failing the LukeRequestHandler 
request, the code would simply _warn_ about the Exception, and the request 
would succeed, ala this code...

{code}
  private static long getFileLength(Directory dir, String filename) {
    try {
      return dir.fileLength(filename);
    } catch (IOException e) {
      // Whatever the error is, only log it and return -1.
      log.warn("Error getting file length for [{}]", filename, e);
      return -1;
    }
  }

{code}

...which is why in comments above, people report seeing exceptions like this in 
their logs (_including the "Error getting file length" info from 
getFileLength_)...

{noformat}
2017-02-16 15:56:00.122 WARN  (qtp895947612-44262) [   x:inclive] 
o.a.s.h.a.LukeRequestHandler Error getting file length
for [segments_cr3]
java.nio.file.NoSuchFileException: 
/index/solr6/data/data/inc_1/index/segments_cr3
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
        at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
        at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
        at java.nio.file.Files.readAttributes(Files.java:1737)
        at java.nio.file.Files.size(Files.java:2332)
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
        at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
        at 
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:598)
{noformat}

...but the {{/luke}} requests themselves don't actually fail.

----

As things stand now, the code is doing exactly what it was designed to do:
 * make an attempt at reporting the _on disk_ size of the segments file 
currently in use by the reader
 * if the size of the segments file can't be determined due to any IO 
Exception, return -1 log a WARN about the IO Exception

The question becomes: do we want to _change_ that logging?

LukeRequestHandler is currently making no assumptions about wether or not any 
IO Exception it encounters is "bad" or not -- from the perspective of that 
class the sitation is non fatal so it's not logging as an ERROR, just a WARNing 
... we could remove this logging, or change it to INFO or DEBUG -- but then 
LukeRequestHandler would be making some explicit assumptions that this type of 
exception is "OK" ... and in many cases it is, but what if in some cases it 
isn't?

I suspect this is why Shai used WARN in the first place -- to avoid making any 
explicit assumptions about how sever the IOException is -- we could easily 
change this to be less noisy, but should we?

My vote would be "yes": log it only at a DEBUG level and update the message to 
something like "Ignoring exception attempting to check (optional) fileSize stat 
for: <filename>"

----

As for the attached patch: I don't understand the nuances of why getting the 
last commit from the DeletionPolicy ma/may-not be better then getting it from 
the IndexReader, but it doesn't change the fundemental root of the issue: a 
commit point may be refering to a (segments) file that is still *usable* by 
already open file handles, but no longer has an entry in the directory listing 
so it does not *exist* if you try to ask the Directory for it's size. changing 
which commit we ask doesn't change the underlying filesystem semantics.

in other words: 
* the patch as is can never fx the underlying source of the NoSuchFileException
* if the patch is a good idea for other reasons, thne that should be spelled 
out in a distinct issue explaining why exactly LukeRequestHandler should _in 
general, for all aspects of index info) be using 
{{core.getDeletionPolicy().getLatestCommit()}} instead of 
{{reader.getIndexCommit()}}

> o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_NNN] -- 
> NoSuchFileException
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9120
>                 URL: https://issues.apache.org/jira/browse/SOLR-9120
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 6.0
>            Reporter: Markus Jelsma
>         Attachments: SOLR-9120.patch, SOLR-9120.patch
>
>
> On Solr 6.0, we frequently see the following errors popping up:
> {code}
> java.nio.file.NoSuchFileException: 
> /var/lib/solr/logs_shard2_replica1/data/index/segments_2c5
>       at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>       at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>       at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>       at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>       at java.nio.file.Files.readAttributes(Files.java:1737)
>       at java.nio.file.Files.size(Files.java:2332)
>       at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
>       at 
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:131)
>       at 
> org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:597)
>       at 
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:585)
>       at 
> org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:137)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2033)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at org.eclipse.jetty.server.Server.handle(Server.java:518)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>       at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to