Re: Character Encodings - No Statement

Sloane, Brandon Mon, 17 Jun 2019 14:43:46 -0700

The field it occurs in is fixed-length, so a padding character makes sense.



I am a bit concerned about implications of using a character that looks like a 
space. This type of character homophone seems like a potential source of errors 
for people using the schema. Assuming we are correct that this character in 
intended as padding, we can probably avoid this issue by advising schema 
writers to specify U+A0 as a padding character, so it doesn't actually make it 
into the infoset.

________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Monday, June 17, 2019 5:17:20 PM
To: dev@daffodil.apache.org
Subject: Re: Character Encodings - No Statement

This sounds like fixed length data fields, or min-length data fields. So the 
character to use wants to be similar in concept to the pad character - i.e., it 
is used to add length to a fixed length field, but has no significance.


I suggest using U+A0 which is "Non Break Space". This is a space for all 
practical purposes, differing only in how it is treated by hyphenation 
algorithms. Using this instead of regular space will allow this data to 
round-trip.


This character should render like a space in every unicode-aware context.

________________________________
From: Sloane, Brandon <bslo...@tresys.com>
Sent: Monday, June 17, 2019 4:55:09 PM
To: dev@daffodil.apache.org
Subject: Character Encodings - No Statement

I am going through link16 (mil-std-6016e, not publically available) to add 
support for some of the special character encodings to Daffodil (simmilar to 
dfi264:dui001 that has already been added).


While doing so, I came across DFI 311 DUI 002. Several bitcodes are 
"UNDEFINED", which I intend to translate into U+FFFD ('�' replacement 
character), which is what we are doing for 264:001.


However, there is also an explicit coding for a NO STATEMENT character. Any 
insight in what a reasonable choice for translating NO STATEMENT to unicode is?


Regards,


Brandon T. Sloane

Associate, Services

bslo...@tresys.com | tresys.com

Re: Character Encodings - No Statement

Reply via email to