[jira] [Commented] (CB-13570) FileReader#readAsText fails with multi-byte UTF-8 characters

Timothy Hambourger (JIRA) Tue, 21 Aug 2018 08:45:45 -0700


    [ 
https://issues.apache.org/jira/browse/CB-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587610#comment-16587610
 ]


Timothy Hambourger commented on CB-13570:
-----------------------------------------

Ralf Kistner, thanks for the great report! My organization hit this one in 
production recently. I'm preparing to submit a PR on github shortly. My fix 
currently covers Android, iOS, and Windows (the platforms we target). I haven't 
tested OSX or browser explicitly yet, but it looks like parallel fixes are 
needed on those platforms too. I'll say more in my PR and add a link here.

 

Ralf Kistner, you write, "I have not experienced this issue on Android yet." My 
experience is that the same bug IS present on Android, but presents different 
symptoms. On Android, the UTF-8 decode by default replaces invalid characters 
with the replacement character U+FFFD. So instead of failing with an 
ENCODING_ERR, on Android the read will report a success, but the returned text 
will replace each split UTF-8 character with two replacement characters, one 
for each half of the split character. iOS and Windows both do a strict UTF-8 
decode, so those platforms report an ENCODING_ERR instead.

> FileReader#readAsText fails with multi-byte UTF-8 characters
> ------------------------------------------------------------
>
>                 Key: CB-13570
>                 URL: https://issues.apache.org/jira/browse/CB-13570
>             Project: Apache Cordova
>          Issue Type: Bug
>          Components: cordova-plugin-file
>    Affects Versions: 5.0.0, 4.2.0
>         Environment: Tested on:
>  * iOS 10.2
>  * cordova-ios 4.3.0
>  * UIWebView (not WKWebView)
>  * cordova 6.4.0
>  * cordova-plugin-file 4.2.0 and 5.0.0
> (Slightly old cordova version, but the issue seems to be in the plugin)
>            Reporter: Ralf Kistner
>            Priority: Major
>
> `FileReader#readAsText` reads the file in chunks of 256KB. If the file 
> contains a multi-byte UTF-8 character that is split into two separate chunks, 
> reading fails with an encoding error (ENCODING_ERR: 5).
> For many apps this is not an issue. However, if I file is larger than 256KB 
> and contains many multi-byte characters, this is likely to happen.
> I have not experienced this issue on Android yet.
> Code that demonstrates the issue: 
> https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3
> In the example, the reading will split the '\u0153' character into '...\x01' 
> and '\x53', which fails to decode in UTF-8.
> A workaround is to use readAsArrayBuffer instead, and do the decoding in 
> JavaScript. However, the decoding can be quite slow on iOS where a native 
> TextDecoder is not available.
> One solution would be to make the chunk sizes semi-flexible, to ensure that 
> it ends on a character boundary (make the chunk larger until decoding 
> succeeds).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CB-13570) FileReader#readAsText fails with multi-byte UTF-8 characters

Reply via email to