Hi all,

I have been looking at the charset encoder/decoders for ebcdic (IBM1047) as a result of HARMONY-6290 and I noticed that the character mappings appear to be slightly different to those originally generated by the TableGenerator tool contributed as part of HARMONY-3593.

When I run the tool on my local machine using the RI, I get byte 0x15 (NEL) mapped to 0x0A (unicode LF) and 0x25 (LF) mapped to 0x85 (unicode NEL). However the Harmony tables have these values the other way around - i.e. byte 0x15 mapped to 0x85 and 0x25 mapped to 0x0A. So it appears we currently have a character mapping difference to the RI. I have opened [1] for this issue and attached a patch to alter our mapping to match the RI.

Before I make the commit, are there any objections/comments on this?

Regards,
Oliver

[1] https://issues.apache.org/jira/browse/HARMONY-6294


Oliver Deakin (JIRA) wrote:
     [ 
https://issues.apache.org/jira/browse/HARMONY-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Deakin resolved HARMONY-6290.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 5.0M11
         Assignee: Oliver Deakin

Fix and test case applied with minor change at repo revision r801230 - please 
check it applied as expected.

BufferedReader.readLine() breaks at EBCDIC newline, violating the spec
----------------------------------------------------------------------

                Key: HARMONY-6290
                URL: https://issues.apache.org/jira/browse/HARMONY-6290
            Project: Harmony
         Issue Type: Bug
         Components: Classlib
        Environment: SVN Revision: 800827
           Reporter: Jesse Wilson
           Assignee: Oliver Deakin
            Fix For: 5.0M11

        Attachments: readLine_no_EBCDIC.patch

  Original Estimate: 0.33h
 Remaining Estimate: 0.33h

The spec says that BufferedReader.readLine() considers only "\r", "\n" and "\r\n" to be line separators. We must not permit additional separator characters. I admit that the RI's behaviour is surprising, and incompatible with it's own Pattern and Scanner classes. But this is the specified behaviour; the doc explicitly calls out which character sequences are used as newlines. It does not permit additional characters to break lines. For users reading EBCDIC-encoded files, a better practice is to read through the files using a Scanner. That way, the application will behave the same when executed on either Harmony or on the RI.
#Android


--
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Reply via email to