[
https://issues.apache.org/jira/browse/TIKA-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15690396#comment-15690396
]
Ahmad Sawalhah commented on TIKA-2183:
--------------------------------------
Traceback (most recent call last):
File "D:/NisreenThalgi/Project/frmmain.py", line 80, in startProcessing
rdFile=ReadFile(self.fname)
File "D:\NisreenThalgi\Project\ReadFile_2.py", line 33, in __init__
self.ReadCorpusFile(filename)
File "D:\NisreenThalgi\Project\ReadFile_2.py", line 37, in ReadCorpusFile
parsed = parser.from_file( filename)
File "C:\Python34\lib\site-packages\tika\parser.py", line 25, in from_file
jsonOutput = parse1('all', filename, serverEndpoint)
File "C:\Python34\lib\site-packages\tika\tika.py", line 217, in parse1
verbose, tikaServerJar)
File "C:\Python34\lib\site-packages\tika\tika.py", line 338, in callServer
resp = verbFn(serviceUrl, encodedData, headers=headers)
File "C:\Python34\lib\site-packages\requests\api.py", line 123, in put
return request('put', url, data=data, **kwargs)
File "C:\Python34\lib\site-packages\requests\api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python34\lib\site-packages\requests\sessions.py", line 475, in
request
resp = self.send(prep, **send_kwargs)
File "C:\Python34\lib\site-packages\requests\sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "C:\Python34\lib\site-packages\requests\adapters.py", line 423, in send
timeout=timeout
File
"C:\Python34\lib\site-packages\requests\packages\urllib3\connectionpool.py",
line 595, in urlopen
chunked=chunked)
File
"C:\Python34\lib\site-packages\requests\packages\urllib3\connectionpool.py",
line 363, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Python34\lib\http\client.py", line 1137, in request
self._send_request(method, url, body, headers)
File "C:\Python34\lib\http\client.py", line 1177, in _send_request
self.putheader(hdr, value)
File "C:\Python34\lib\http\client.py", line 1109, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 21-27:
ordinal not in range(256)
> Can't Read file if its name is Arabic
> -------------------------------------
>
> Key: TIKA-2183
> URL: https://issues.apache.org/jira/browse/TIKA-2183
> Project: Tika
> Issue Type: Bug
> Components: general, languageidentifier
> Affects Versions: 1.14
> Reporter: Ahmad Sawalhah
>
> if I have an Arabic File name like ( احمد.docx ) it gives me this error
> File "C:\Python34\lib\http\client.py", line 1109, in putheader
> values[i] = one_value.encode('latin-1')
> UnicodeEncodeError: 'latin-1' codec can't encode characters in position
> 21-27: ordinal not in range(256)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)