[
https://issues.apache.org/jira/browse/TIKA-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sasha Goodman updated TIKA-2604:
--------------------------------
Description:
I've been developing an R interface to the Tika batch processor for the past
month ( see: [https://github.com/predict-r/rtika] ), and this software is
awesome. I use the command line to call the batch processor, and my code has
worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code
as well. Its been working.
A few days ago I found an issue with the batch processor on OS X.
When calling the batch processor with the tika-app-1.17.jar on a path with
spaces in it, Tika starts to continually restart.
Here is an example of calling the jar *when the path has spaces.* It *produces
this error, and the unexpected restarts*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space
folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
-fileList
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: Error: Could not find or load main class
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Must restart process (exitValue=1 numRestarts=0
receivedRestartMessage=false)
INFO BatchProcess: Error: Could not find or load main class
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Hit the maximum number of process restarts. Driver is shutting down now.
INFO Process driver has completed{code}
The error ALSO occurs with double quotes also around the jar.
*Now, in contrast,* calling the jar when the *path does not have spaces
produces absolutely NO error*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar'
-maxRestarts 1 -t -i '/' -o
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
-fileList
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: log4j:WARN No appenders could be found for logger
(org.apache.tika.batch.fs.FSBatchProcessCLI).
INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
INFO BatchProcess: log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO BatchProcess: Mar 09, 2018 12:19:17 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be
ignored
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be
processed.
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess:
INFO BatchProcess: Mar 09, 2018 12:19:17 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
INFO BatchProcess: Please provide the jar on your classpath to parse sqlite
files.
INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
BatchProcess:
BatchProcess:
BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1,
numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0,
causeForTermination='COMPLETED_NORMALLY'}
INFO The child process has finished with an exit value of: 0
INFO Process driver has completed{code}
Further, and what makes this a batch processor issue, is that that path with
the space in it produces absolutely *NO error in the normal Tika CLI mode
either*:
{code:java}
java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t
/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf
{code}
The last two examples work, but the first does not.
The only difference is the first is calling the batch processor, and that is
causing restarts with whatever file.
was:
I've been developing an R interface to the Tika batch processor for the past
month ( see: [https://github.com/predict-r/rtika] ), and this software is
awesome. I use the command line to call the batch processor, and my code has
worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code
as well. Its been working.
A few days ago I found an issue with the batch processor on OS X.
When calling the batch processor with the tika-app-1.17.jar on a path with
spaces in it, Tika starts to continually restart.
Here is an example of calling the jar *when the path has spaces.* It *produces
this error, and the unexpected restarts*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space
folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
-fileList
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: Error: Could not find or load main class
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Must restart process (exitValue=1 numRestarts=0
receivedRestartMessage=false)
INFO BatchProcess: Error: Could not find or load main class
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
org.apache.tika.batch.fs.FSBatchProcessCLI
INFO The child process has finished with an exit value of: 1
WARN Restarting on unexpected restart code: 1
WARN Hit the maximum number of process restarts. Driver is shutting down now.
INFO Process driver has completed{code}
The error ALSO occurs with double quotes also around the jar.
*Now, in contrast,* calling the jar when the *path does not have spaces
produces absolutely NO error*:
{code:java}
java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar'
-maxRestarts 1 -t -i '/' -o
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
-fileList
'/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
INFO about to start driver
INFO BatchProcess: log4j:WARN No appenders could be found for logger
(org.apache.tika.batch.fs.FSBatchProcessCLI).
INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
INFO BatchProcess: log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO BatchProcess: Mar 09, 2018 12:19:17 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be
ignored
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be
processed.
INFO BatchProcess: See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
INFO BatchProcess: for optional dependencies.
INFO BatchProcess:
INFO BatchProcess: Mar 09, 2018 12:19:17 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
INFO BatchProcess: Please provide the jar on your classpath to parse sqlite
files.
INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
BatchProcess:
BatchProcess:
BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1,
numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0,
causeForTermination='COMPLETED_NORMALLY'}
INFO The child process has finished with an exit value of: 0
INFO Process driver has completed{code}
Further, and what makes this a batch processor issue, is that that path with
the space in it produces absolutely *NO error in the normal Tika CLI mode*:
{code:java}
java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t
/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf
{code}
The last two examples work, but the first does not.
The only difference is the first is calling the batch processor, and that is
causing restarts with whatever file.
> Error with certain jar paths on OS X
> ------------------------------------
>
> Key: TIKA-2604
> URL: https://issues.apache.org/jira/browse/TIKA-2604
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.17
> Environment: tika-app-1.17.jar, OS X 10.13.3.
>
> Reporter: Sasha Goodman
> Priority: Major
>
> I've been developing an R interface to the Tika batch processor for the past
> month ( see: [https://github.com/predict-r/rtika] ), and this software is
> awesome. I use the command line to call the batch processor, and my code has
> worked on Ubuntu, Windows 10 and OS X. Several people have been testing my
> code as well. Its been working.
> A few days ago I found an issue with the batch processor on OS X.
> When calling the batch processor with the tika-app-1.17.jar on a path with
> spaces in it, Tika starts to continually restart.
> Here is an example of calling the jar *when the path has spaces.* It
> *produces this error, and the unexpected restarts*:
> {code:java}
> java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space
> folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o
> '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
> -fileList
> '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
> INFO about to start driver
> INFO BatchProcess: Error: Could not find or load main class
> org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
> org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO The child process has finished with an exit value of: 1
> WARN Restarting on unexpected restart code: 1
> WARN Must restart process (exitValue=1 numRestarts=0
> receivedRestartMessage=false)
> INFO BatchProcess: Error: Could not find or load main class
> org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO BatchProcess: Caused by: java.lang.ClassNotFoundException:
> org.apache.tika.batch.fs.FSBatchProcessCLI
> INFO The child process has finished with an exit value of: 1
> WARN Restarting on unexpected restart code: 1
> WARN Hit the maximum number of process restarts. Driver is shutting down now.
> INFO Process driver has completed{code}
> The error ALSO occurs with double quotes also around the jar.
> *Now, in contrast,* calling the jar when the *path does not have spaces
> produces absolutely NO error*:
> {code:java}
> java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar'
> -maxRestarts 1 -t -i '/' -o
> '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e'
> -fileList
> '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
> INFO about to start driver
> INFO BatchProcess: log4j:WARN No appenders could be found for logger
> (org.apache.tika.batch.fs.FSBatchProcessCLI).
> INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
> INFO BatchProcess: log4j:WARN See
> http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> INFO BatchProcess: Mar 09, 2018 12:19:17 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be
> ignored
> INFO BatchProcess: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be
> processed
> INFO BatchProcess: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be
> processed.
> INFO BatchProcess: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> INFO BatchProcess: for optional dependencies.
> INFO BatchProcess:
> INFO BatchProcess: Mar 09, 2018 12:19:17 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
> INFO BatchProcess: Please provide the jar on your classpath to parse sqlite
> files.
> INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
> INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
> BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
> BatchProcess:
> BatchProcess:
> BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1,
> numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0,
> causeForTermination='COMPLETED_NORMALLY'}
> INFO The child process has finished with an exit value of: 0
> INFO Process driver has completed{code}
> Further, and what makes this a batch processor issue, is that that path with
> the space in it produces absolutely *NO error in the normal Tika CLI mode
> either*:
> {code:java}
> java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t
> /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf
> {code}
> The last two examples work, but the first does not.
> The only difference is the first is calling the batch processor, and that is
> causing restarts with whatever file.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)