Ralf Kistner created CB-13570:
---------------------------------
Summary: FileReader#readAsText fails with multi-byte UTF-8
characters
Key: CB-13570
URL: https://issues.apache.org/jira/browse/CB-13570
Project: Apache Cordova
Issue Type: Bug
Components: cordova-plugin-file
Affects Versions: 5.0.0, 4.2.0
Environment: Tested on:
* iOS 10.2
* cordova-ios 4.3.0
* UIWebView (not WKWebView)
* cordova 6.4.0
* cordova-plugin-file 4.2.0 and 5.0.0
(Slightly old cordova version, but the issue seems to be in the plugin)
Reporter: Ralf Kistner
`FileReader#readAsText` reads the file in chunks of 256KB. If the file contains
a multi-byte UTF-8 character that is split into two separate chunks, reading
fails with an encoding error (ENCODING_ERR: 5).
For many apps this is not an issue. However, if I file is larger than 256KB and
contains many multi-byte characters, this is likely to happen.
I have not experienced this issue on Android yet.
Code that demonstrates the issue:
https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3
In the example, the reading will split the '\u0153' character into '...\x01'
and '\x53', which fails to decode in UTF-8.
A workaround is to use readAsArrayBuffer instead, and do the decoding in
JavaScript. However, the decoding can be quite slow on iOS where a native
TextDecoder is not available.
One solution would be to make the chunk sizes semi-flexible, to ensure that it
ends on a character boundary (make the chunk larger until decoding succeeds).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]