Sasha Goodman created TIKA-2604:
-----------------------------------

             Summary: Error with certain jar paths on OS X
                 Key: TIKA-2604
                 URL: https://issues.apache.org/jira/browse/TIKA-2604
             Project: Tika
          Issue Type: Bug
          Components: cli
    Affects Versions: 1.17
         Environment: tika-app-1.17.jar, OS X 10.13.3. 

 
            Reporter: Sasha Goodman


I've been developing an R interface to the Tika batch processor for the past 
month ( see: [https://github.com/predict-r/rtika] ), and this software is 
awesome. I use the command line to call the batch processor, and my code has 
worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code 
as well. Its been working.

A few days ago I found an issue with the batch processor on OS X. 

When calling the batch processor with the tika-app-1.17.jar on a path with 
spaces in it, Tika starts to continually restart.

Here is an example of calling the jar *when the path has spaces.* It ** 
produces this *error, and unexpected restart*: 
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space 
folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o 
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
 -fileList 
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'

INFO about to start driver
INFO BatchProcess: Error: Could not find or load main class 
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: 
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Must restart process (exitValue=1 numRestarts=0 
receivedRestartMessage=false)
INFO BatchProcess: Error: Could not find or load main class 
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: 
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Hit the maximum number of process restarts. Driver is shutting down now.
INFO Process driver has completed{code}
The error occurs with double quotes also around the jar.

*In contrast,* calling the jar when the *path does not have spaces produces 
absolutely NO error*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar' 
-maxRestarts 1 -t -i '/' -o 
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
 -fileList 
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: log4j:WARN No appenders could be found for logger 
(org.apache.tika.batch.fs.FSBatchProcessCLI).
INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
INFO BatchProcess: log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO BatchProcess: Mar 09, 2018 12:19:17 AM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be 
ignored
INFO BatchProcess: See 
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
INFO BatchProcess: See 
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be 
processed.
INFO BatchProcess: See 
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess:
INFO BatchProcess: Mar 09, 2018 12:19:17 AM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
INFO BatchProcess: Please provide the jar on your classpath to parse sqlite 
files.
INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
BatchProcess:
BatchProcess:
BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1, 
numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0, 
causeForTermination='COMPLETED_NORMALLY'}
INFO The child process has finished with an exit value of: 0
INFO Process driver has completed{code}
 

 

Further, and what makes this a batch processor issue, is that that path with 
the space in it produces absolutely *NO error in the normal Tika CLI mode*: 

 

 
{code:java}
java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t 
/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf

{code}
 

The last two examples work, but the first does not. 

The only difference is the first is calling the batch processor, and that is 
causing bugs with whatever file.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to