[
https://issues.apache.org/jira/browse/TIKA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Hallett updated TIKA-2794:
-------------------------------
Environment:
MacBook Pro and Windows Server 2012
This code works on the enclosed pdf file on a MacBook, but not using windows
server?
was:
try:
headers = \{'X-Tika-PDFextractInlineImages': 'true',}
data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER,
headers=headers)
charstoreturn = data['content'].strip().split()[:limit]
charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"',
"'").replace(",","").replace("'","'")
return True, charstoreturn
except Exception as err:
return False, "error {} on file: {}.\n".format(str(err), pathtofile)
This code works on the enclosed pdf file on a MacBook, but not using windows
server?
Description:
try:
headers = \{'X-Tika-PDFextractInlineImages': 'true',} #
data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER,
headers=headers)
charstoreturn = data['content'].strip().split()[:limit]
charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"',
"'").replace(",","").replace("'","'")
return True, charstoreturn
except Exception as err:
return False, "error {} on file: {}.\n".format(str(err), pathtofile)
was:
try:
headers = \{'X-Tika-PDFextractInlineImages': 'true',} #
data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER,
headers=headers)
#data = parser.from_file(pathtofile, self.TIKA_SERVER)
charstoreturn = data['content'].strip().split()[:limit]
charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"',
"'").replace(",","").replace("'","'")
return True, charstoreturn
except Exception as err:
return False, "error {} on file: {}.\n".format(str(err), pathtofile)
> Tika extracts text from pdf on MacBook, but not windows server.,
> ----------------------------------------------------------------
>
> Key: TIKA-2794
> URL: https://issues.apache.org/jira/browse/TIKA-2794
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.19.1
> Environment: MacBook Pro and Windows Server 2012
> This code works on the enclosed pdf file on a MacBook, but not using windows
> server?
> Reporter: Paul Hallett
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: test2.pdf
>
>
> try:
> headers = \{'X-Tika-PDFextractInlineImages': 'true',} #
> data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER,
> headers=headers)
> charstoreturn = data['content'].strip().split()[:limit]
> charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"',
> "'").replace(",","").replace("'","'")
> return True, charstoreturn
> except Exception as err:
> return False, "error {} on file: {}.\n".format(str(err), pathtofile)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)