[ 
https://issues.apache.org/jira/browse/TIKA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100057#comment-16100057
 ] 

Karl Buchta edited comment on TIKA-2433 at 7/25/17 1:45 PM:
------------------------------------------------------------

This is how we start the server:

{noformat}
java -Djava.awt.headless=true -jar /opt/tika/tika.jar --server --port 9100 -t
{noformat}


To be more exact, this is our supervisor config:

{noformat}
[program:tika]
priority=2
command=java -Djava.awt.headless=true -jar /opt/tika/tika.jar --server --port 
9100 -t
autorestart=true
stopsignal = KILL
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
{noformat}


We write to / read from the server via ruby sockets directly.
This code was not subject to change during the upgrade, meaning it worked with 
1.15.
Otherwise we do not use the the Tika via cli, but only this way.
{code}
  class Tika < ExtractorBase

    def self.run (data, binary = false)
      if binary
        data = StringIO.new(data , 'rb') # we use b to read as binary, so we do 
not destroy the encoding we do not know
      else
        # TODO: should never be used
        data = StringIO.new(data , 'r') # we use b to read as binary, so we do 
not destroy the encoding we do not know
      end


      s = TCPSocket.new(ENV['TIKA_SERVER'], ENV['TIKA_PORT'])
      i = 0
      while 1
        chunk = data.read(65536)
        break unless chunk
        s.write(chunk)
        i += 65536
      end
      s.shutdown(Socket::SHUT_WR)
      resp = ''
      while 1
        chunk = s.recv(65536)
        break if chunk.empty? || !chunk
        resp << chunk
      end
      resp
    end

  end
{code}

Thank you a lot for your quick reply.


was (Author: karlbuchta):
This is how we start the server:

{noformat}
java -Djava.awt.headless=true -jar /opt/tika/tika.jar --server --port 9100 -t
{noformat}


To be more exact, this is our supervisor config:

{noformat}
[program:tika]
priority=2
command=java -Djava.awt.headless=true -jar /opt/tika/tika.jar --server --port 
9100 -t
autorestart=true
stopsignal = KILL
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
{noformat}


We write to / read from the server via ruby sockets directly.
This code was not subject to change during the upgrade, meaning it worked with 
1.15.
Otherwise we do not use the the Tika via cli, but only this way.
{code:ruby}
  class Tika < ExtractorBase

    def self.run (data, binary = false)
      if binary
        data = StringIO.new(data , 'rb') # we use b to read as binary, so we do 
not destroy the encoding we do not know
      else
        # TODO: should never be used
        data = StringIO.new(data , 'r') # we use b to read as binary, so we do 
not destroy the encoding we do not know
      end


      s = TCPSocket.new(ENV['TIKA_SERVER'], ENV['TIKA_PORT'])
      i = 0
      while 1
        chunk = data.read(65536)
        break unless chunk
        s.write(chunk)
        i += 65536
      end
      s.shutdown(Socket::SHUT_WR)
      resp = ''
      while 1
        chunk = s.recv(65536)
        break if chunk.empty? || !chunk
        resp << chunk
      end
      resp
    end

  end
{code}

Thank you a lot for your quick reply.

> Tika 1.16 - Nullpointer Exception after update - Asking for help
> ----------------------------------------------------------------
>
>                 Key: TIKA-2433
>                 URL: https://issues.apache.org/jira/browse/TIKA-2433
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.16
>         Environment: Docker - Debian Stretch - Oracle Java
> +Installation in Dockerfile+
> {noformat}
> ENV TIKA_VERSION 1.16
> # also see 
> https://github.com/LogicalSpark/docker-tikaserver/blob/master/Dockerfile
> RUN mkdir -p /opt/tika && cd /opt/tika && curl --fail 
> http://www-eu.apache.org/dist/tika/tika-app-${TIKA_VERSION}.jar -o tika.jar \
>  && curl --fail 
> http://www-eu.apache.org/dist/tika/tika-server-${TIKA_VERSION}.jar -o 
> tika-server.jar \
>  && apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-ita 
> tesseract-ocr-fra tesseract-ocr-spa tesseract-ocr-deu gdal-bin
> {noformat}
> +Tika.xml+
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>     <parsers>
>         <parser class="org.apache.tika.parser.DefaultParser">
>             <parser-exclude 
> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>         </parser>
>     </parsers>
> </properties>
> {noformat}
>            Reporter: Karl Buchta
>
> Hi,
> i would like to kindly ask for help. We had to update to the latest Tika 
> 1.16. I have no experience in Tika so far, i am just maintaining the 
> configuration and application from another developer.
> Version 1.15 worked very fine for us. But right now i see following error 
> (office is the name of our docker container, hence this output):
> https://github.com/apache/tika/blob/1.16/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java#L202
> {noformat}
> office     | java.lang.NullPointerException
> office     |  at 
> org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:202)
> office     |  at 
> org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:1153)
> {noformat}
> I have checked the source on github and have seen, that this code part was 
> changed with one of the latest commits before the 1.16 release (see link 
> above).
> I checked the Change.txt at https://tika.apache.org/1.16/index.html. As i 
> haven't used Tika so far, and i cannot see that the CLI requirements changed 
> from the release notes, i would like to ask, whether this is the case anyway. 
> Do you have some hints on where to start, is this maybe due to improper cli 
> usage? Or do you think there is a missing java package or dependency?
> It's hard for me to say, as the cli commands are automated and distributed 
> over several layers and configuration files in the application stack, hence i 
> am asking for a hint.
> Thx for any advice, best Karl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to