[
https://issues.apache.org/jira/browse/CB-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587667#comment-16587667
]
ASF GitHub Bot commented on CB-13570:
-------------------------------------
TimHambourger opened a new pull request #242: CB-13570 -- (android, ios,
windows) Handle multi-byte UTF-8 characters that cross a chunk boundary
URL: https://github.com/apache/cordova-plugin-file/pull/242
<!--
Please make sure the checklist boxes are all checked before submitting the
PR. The checklist
is intended as a quick reference, for complete details please see our
Contributor Guidelines:
http://cordova.apache.org/contribute/contribute_guidelines.html
Thanks!
-->
### Platforms affected
Android, iOS, Windows.
My organization hit this one in production in a highly visible (to the end
user) setting. So I've gone ahead and fixed the 3 platforms we target. I
haven't yet tested on OSX or browser, but looking at the source it looks like
fixes are likely needed on those platforms too. I don't think that needs to
hold up this PR, b/c I've made my change in a backwards compatible way. But
that could be an area to discuss.
### What does this PR do?
Fixes [CB-13570](https://issues.apache.org/jira/browse/CB-13570) on the
specified platforms. Specifically, this PR changes the JS-to-native interface
for the readAsX methods. Previously the JS side expected the native side to
return the read value (be it a text string, ArrayBuffer, data URL, etc.) With
this PR, the JS side now can handle two result formats from the native side:
1. An object like `{ value: any, numBytesConsumed: number }`, where `value`
is the read value (text, ArrayBuffer, etc.) and `numBytesConsumed` is the
number of bytes consumed to read that value. `numBytesConsumed` can differ from
the specified `READ_CHUNK_SIZE` for reasons that will be clear shortly.
1. The previous format, for backwards compatibility with platforms/read
methods that haven't yet had their native sides updated for this change.
Then, on Android, iOS, and Windows, the native side uses this new
flexibility to change its handling for readAsText specifically. Depending on
the specified encoding, if the end offset requested by the JS side would cause
a multi-byte character to get split, the native side extends the end offset as
needed to prevent splitting. The native side then returns an accurate
`numBytesConsumed` to reflect the extra bytes needed.
### What testing has been done on this change?
I added an automated test that exposes the bug and now passes on the fixed
platforms. I also did manual testing in the app that exposed the bug for us.
### Checklist
- [x] [Reported an issue](http://cordova.apache.org/contribute/issues.html)
in the JIRA database
- [x] Commit message follows the format: "CB-3232: (android) Fix bug with
resolving file paths", where CB-xxxx is the JIRA ID & "android" is the platform
affected.
- [x] Added automated test coverage as appropriate for this change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> FileReader#readAsText fails with multi-byte UTF-8 characters
> ------------------------------------------------------------
>
> Key: CB-13570
> URL: https://issues.apache.org/jira/browse/CB-13570
> Project: Apache Cordova
> Issue Type: Bug
> Components: cordova-plugin-file
> Affects Versions: 5.0.0, 4.2.0
> Environment: Tested on:
> * iOS 10.2
> * cordova-ios 4.3.0
> * UIWebView (not WKWebView)
> * cordova 6.4.0
> * cordova-plugin-file 4.2.0 and 5.0.0
> (Slightly old cordova version, but the issue seems to be in the plugin)
> Reporter: Ralf Kistner
> Priority: Major
>
> `FileReader#readAsText` reads the file in chunks of 256KB. If the file
> contains a multi-byte UTF-8 character that is split into two separate chunks,
> reading fails with an encoding error (ENCODING_ERR: 5).
> For many apps this is not an issue. However, if I file is larger than 256KB
> and contains many multi-byte characters, this is likely to happen.
> I have not experienced this issue on Android yet.
> Code that demonstrates the issue:
> https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3
> In the example, the reading will split the '\u0153' character into '...\x01'
> and '\x53', which fails to decode in UTF-8.
> A workaround is to use readAsArrayBuffer instead, and do the decoding in
> JavaScript. However, the decoding can be quite slow on iOS where a native
> TextDecoder is not available.
> One solution would be to make the chunk sizes semi-flexible, to ensure that
> it ends on a character boundary (make the chunk larger until decoding
> succeeds).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]