Sometimes it's outright prohibitted, e.g., RFC 8259: "Implementations MUST NOT
add a byte order mark (U+FEFF) to the beginning of a networked-transmitted
JSON text. In the interests of interoperability, implementations that parse
JSON texts MAY ignore the presence of a byte order mark rather than treating
it as an error."
Further, IETF is moving in the direction of protocols in which UTF-8 is
mandatory, and RFC 3629, section 6. Byte order mark (BOM), states
In the meantime, the uncertainty unfortunately remains and may affect
Internet protocols. Protocol specifications MAY restrict usage of
U+FEFF as a signature in order to reduce or eliminate the potential
ill effects of this uncertainty. In the interest of striking a
balance between the advantages (reduction of uncertainty) and
drawbacks (loss of the signature function) of such restrictions, it
is useful to distinguish a few cases:
o A protocol SHOULD forbid use of U+FEFF as a signature for those
textual protocol elements that the protocol mandates to be always
UTF-8, the signature function being totally useless in those
cases.
o A protocol SHOULD also forbid use of U+FEFF as a signature for
those textual protocol elements for which the protocol provides
character encoding identification mechanisms, when it is expected
that implementations of the protocol will be in a position to
always use the mechanisms properly. This will be the case when
the protocol elements are maintained tightly under the control of
the implementation from the time of their creation to the time of
their (properly labeled) transmission.
--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3
________________________________________
From: IBM Mainframe Discussion List [[email protected]] on behalf of
Paul Gilmartin [[email protected]]
Sent: Tuesday, July 27, 2021 7:05 PM
To: [email protected]
Subject: Re: FTP distributed system EBCDIC encoded file
On Tue, 27 Jul 2021 17:01:56 -0500, Frank Swarbrick wrote:
>We have a vendor that is providing a file that is EBCDIC (IBM-1140) encoded,
>but also includes an NL record/line terminator. The source system is NOT a
>mainframe system. I'm trying to figure out how to FTP the file to the
>mainframe and have it treat NL as, well, NL; i.e. a record terminator. Binary
>mode (no SITE options) doesn't work because it stores the NL characters.
>ASCII mode (no SITE options) doesn't work, I believe because it still expects
>the CRLF delimiter. I tried specifying "SITE TYPE E" (EBCDIC) and that also
>does not eliminate the NL delimiter.
>
>Any thoughts? We're seeing if the vendor can just not use a delimiter at all,
>but no luck yet.
>
Doesn't z/OS use NL as its line separator? Verify/refute this with:
echo 'foo
bar' | od -tx1
I'd expect you to see:
0000000 86 96 96 15 82 81 99 25
0000010
where the x'15' is the NL. I expect transfer in binary to preserve the NL and
simply work.
>Note: They can create it in UTF-8, but they are including the UTF-8 Byte Order
>Mark (BOM). I am able to get z/OS to strip the BOM, but I have to specify the
>transmission as being "multi-byte", so the destination has to be VB. Which we
>can deal with, but we'd prefer FB as that is how we have it from the old
>vendor.
>
Use of a BOM with UTF-8 is generally deprecated.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN