[ 
https://issues.apache.org/jira/browse/TIKA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Hallett updated TIKA-2794:
-------------------------------
    Environment: 
MacBook Pro and Windows Server 2012



This code works on the enclosed pdf file on a MacBook, but not using windows 
server?

  was:
try:
    headers = \{'X-Tika-PDFextractInlineImages': 'true',} 
    data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, 
headers=headers)

    charstoreturn = data['content'].strip().split()[:limit]
    charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', 
"'").replace(",","").replace("'","'")

    return True, charstoreturn
 except Exception as err:
    return False, "error {} on file: {}.\n".format(str(err), pathtofile)

This code works on the enclosed pdf file on a MacBook, but not using windows 
server?

    Description: 
try:
    headers = \{'X-Tika-PDFextractInlineImages': 'true',} # 
    data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, 
headers=headers)
    charstoreturn = data['content'].strip().split()[:limit]
    charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', 
"'").replace(",","").replace("'","'")
    return True, charstoreturn
except Exception as err:
    return False, "error {} on file: {}.\n".format(str(err), pathtofile)



  was:
try:
 headers = \{'X-Tika-PDFextractInlineImages': 'true',} # 
 data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, 
headers=headers)
 #data = parser.from_file(pathtofile, self.TIKA_SERVER)

charstoreturn = data['content'].strip().split()[:limit]
 charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', 
"'").replace(",","").replace("'","'")

return True, charstoreturn
 except Exception as err:
 return False, "error {} on file: {}.\n".format(str(err), pathtofile)


> Tika extracts text from pdf on MacBook, but not windows server.,
> ----------------------------------------------------------------
>
>                 Key: TIKA-2794
>                 URL: https://issues.apache.org/jira/browse/TIKA-2794
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.19.1
>         Environment: MacBook Pro and Windows Server 2012
> This code works on the enclosed pdf file on a MacBook, but not using windows 
> server?
>            Reporter: Paul Hallett
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: test2.pdf
>
>
> try:
>     headers = \{'X-Tika-PDFextractInlineImages': 'true',} # 
>     data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, 
> headers=headers)
>     charstoreturn = data['content'].strip().split()[:limit]
>     charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', 
> "'").replace(",","").replace("'","'")
>     return True, charstoreturn
> except Exception as err:
>     return False, "error {} on file: {}.\n".format(str(err), pathtofile)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to