[jira] [Commented] (CB-13570) FileReader#readAsText fails with multi-byte UTF-8 characters

ASF GitHub Bot (JIRA) Tue, 21 Aug 2018 09:17:28 -0700


    [ 
https://issues.apache.org/jira/browse/CB-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587667#comment-16587667
 ]


ASF GitHub Bot commented on CB-13570:
-------------------------------------

TimHambourger opened a new pull request #242: CB-13570 -- (android, ios, 
windows) Handle multi-byte UTF-8 characters that cross a chunk boundary
URL: https://github.com/apache/cordova-plugin-file/pull/242
 
 
   <!--
   Please make sure the checklist boxes are all checked before submitting the 
PR. The checklist
   is intended as a quick reference, for complete details please see our 
Contributor Guidelines:
   
   http://cordova.apache.org/contribute/contribute_guidelines.html
   
   Thanks!
   -->
   
   ### Platforms affected
   Android, iOS, Windows.
   
   My organization hit this one in production in a highly visible (to the end 
user) setting. So I've gone ahead and fixed the 3 platforms we target. I 
haven't yet tested on OSX or browser, but looking at the source it looks like 
fixes are likely needed on those platforms too. I don't think that needs to 
hold up this PR, b/c I've made my change in a backwards compatible way. But 
that could be an area to discuss.
   
   ### What does this PR do?
   Fixes [CB-13570](https://issues.apache.org/jira/browse/CB-13570) on the 
specified platforms. Specifically, this PR changes the JS-to-native interface 
for the readAsX methods. Previously the JS side expected the native side to 
return the read value (be it a text string, ArrayBuffer, data URL, etc.) With 
this PR, the JS side now can handle two result formats from the native side:
   1. An object like `{ value: any, numBytesConsumed: number }`, where `value` 
is the read value (text, ArrayBuffer, etc.) and `numBytesConsumed` is the 
number of bytes consumed to read that value. `numBytesConsumed` can differ from 
the specified `READ_CHUNK_SIZE` for reasons that will be clear shortly.
   1. The previous format, for backwards compatibility with platforms/read 
methods that haven't yet had their native sides updated for this change.
   
   Then, on Android, iOS, and Windows, the native side uses this new 
flexibility to change its handling for readAsText specifically. Depending on 
the specified encoding, if the end offset requested by the JS side would cause 
a multi-byte character to get split, the native side extends the end offset as 
needed to prevent splitting. The native side then returns an accurate 
`numBytesConsumed` to reflect the extra bytes needed.
   
   
   ### What testing has been done on this change?
   I added an automated test that exposes the bug and now passes on the fixed 
platforms. I also did manual testing in the app that exposed the bug for us.
   
   ### Checklist
   - [x] [Reported an issue](http://cordova.apache.org/contribute/issues.html) 
in the JIRA database
   - [x] Commit message follows the format: "CB-3232: (android) Fix bug with 
resolving file paths", where CB-xxxx is the JIRA ID & "android" is the platform 
affected.
   - [x] Added automated test coverage as appropriate for this change.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> FileReader#readAsText fails with multi-byte UTF-8 characters
> ------------------------------------------------------------
>
>                 Key: CB-13570
>                 URL: https://issues.apache.org/jira/browse/CB-13570
>             Project: Apache Cordova
>          Issue Type: Bug
>          Components: cordova-plugin-file
>    Affects Versions: 5.0.0, 4.2.0
>         Environment: Tested on:
>  * iOS 10.2
>  * cordova-ios 4.3.0
>  * UIWebView (not WKWebView)
>  * cordova 6.4.0
>  * cordova-plugin-file 4.2.0 and 5.0.0
> (Slightly old cordova version, but the issue seems to be in the plugin)
>            Reporter: Ralf Kistner
>            Priority: Major
>
> `FileReader#readAsText` reads the file in chunks of 256KB. If the file 
> contains a multi-byte UTF-8 character that is split into two separate chunks, 
> reading fails with an encoding error (ENCODING_ERR: 5).
> For many apps this is not an issue. However, if I file is larger than 256KB 
> and contains many multi-byte characters, this is likely to happen.
> I have not experienced this issue on Android yet.
> Code that demonstrates the issue: 
> https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3
> In the example, the reading will split the '\u0153' character into '...\x01' 
> and '\x53', which fails to decode in UTF-8.
> A workaround is to use readAsArrayBuffer instead, and do the decoding in 
> JavaScript. However, the decoding can be quite slow on iOS where a native 
> TextDecoder is not available.
> One solution would be to make the chunk sizes semi-flexible, to ensure that 
> it ends on a character boundary (make the chunk larger until decoding 
> succeeds).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CB-13570) FileReader#readAsText fails with multi-byte UTF-8 characters

Reply via email to