The field it occurs in is fixed-length, so a padding character makes sense.
I am a bit concerned about implications of using a character that looks like a space. This type of character homophone seems like a potential source of errors for people using the schema. Assuming we are correct that this character in intended as padding, we can probably avoid this issue by advising schema writers to specify U+A0 as a padding character, so it doesn't actually make it into the infoset. ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Monday, June 17, 2019 5:17:20 PM To: dev@daffodil.apache.org Subject: Re: Character Encodings - No Statement This sounds like fixed length data fields, or min-length data fields. So the character to use wants to be similar in concept to the pad character - i.e., it is used to add length to a fixed length field, but has no significance. I suggest using U+A0 which is "Non Break Space". This is a space for all practical purposes, differing only in how it is treated by hyphenation algorithms. Using this instead of regular space will allow this data to round-trip. This character should render like a space in every unicode-aware context. ________________________________ From: Sloane, Brandon <bslo...@tresys.com> Sent: Monday, June 17, 2019 4:55:09 PM To: dev@daffodil.apache.org Subject: Character Encodings - No Statement I am going through link16 (mil-std-6016e, not publically available) to add support for some of the special character encodings to Daffodil (simmilar to dfi264:dui001 that has already been added). While doing so, I came across DFI 311 DUI 002. Several bitcodes are "UNDEFINED", which I intend to translate into U+FFFD ('�' replacement character), which is what we are doing for 264:001. However, there is also an explicit coding for a NO STATEMENT character. Any insight in what a reasonable choice for translating NO STATEMENT to unicode is? Regards, Brandon T. Sloane Associate, Services bslo...@tresys.com | tresys.com