Konstantin Avdeev created CONNECTORS-1312:
---------------------------------------------

             Summary: jcifs.smb.SmbException: Connection reset by peer: socket 
write error
                 Key: CONNECTORS-1312
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1312
             Project: ManifoldCF
          Issue Type: Bug
          Components: JCIFS connector
    Affects Versions: ManifoldCF 2.5
         Environment: Windows x64, java 1.8.x
            Reporter: Konstantin Avdeev


hi Karl,

we've found another JCIFS exception: Windows share jobs stop when encountering 
a "Connection reset by peer" error, e.g.:
{code}
ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException tossed 
processing smb://server.domain.com/path/file.ppt
jcifs.smb.SmbException: Connection reset by peer: socket write error
java.net.SocketException: Connection reset by peer: socket write error
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453)
        at jcifs.util.transport.Transport.sendrecv(Transport.java:67)
        at jcifs.smb.SmbTransport.send(SmbTransport.java:655)
        at jcifs.smb.SmbSession.send(SmbSession.java:238)
        at jcifs.smb.SmbTree.send(SmbTree.java:119)
        at jcifs.smb.SmbFile.send(SmbFile.java:775)
        at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
        at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at java.io.FilterInputStream.read(FilterInputStream.java:107)
        at java.nio.file.Files.copy(Files.java:2908)
        at java.nio.file.Files.copy(Files.java:3027)
        at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587)
        at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615)
        at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358)
        at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424)
        at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
        at 
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
        at 
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
        at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
{code}

Current workaround - to start the job again (manually or by the scheduler).

It is clear, that there are many errors, when it makes no sense to skip a 
failed URL and continue the job, e.g.:
{code}
Error: SmbAuthException thrown: Logon failure: unknown user name or bad 
password.
{code}

I'm thinking about a general solution, like defining a list (through the UI or 
properties.xml) with non severe exceptions, like "file busy" or "symlink 
detected" etc, so the admins would be able to specify, when the crawler should 
stop and when it should retry, skip and go further.

What do you think?
Thank you!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to