On 2021-03-08 21:36, Martin Frb via fpc-pascal wrote:
 .
 .
In the example the index access should have returned a single
codeunit, which was known to be a complete codepoint.
As far as I understand the unexpected part was, that the unicode
string did not contain the content of the string constant, because the
assignment had caused an encoding conversion to be inserted.
That conversion caused the need for a widestring manager.

Maybe to help the search when/where and whatfor notes/warnings
should/could be produced, those implicit conversions can be broken
down into groups.
I can think of 2 groups already.
1) Conversion due to explicit declared different encoding.
   AnAnsiString := SomeWideString;
  AnAsciiString := AnUtf8String; // declared as "type
AnsiString(CP_ASCII);" and "type AnsiString(CP_UTF8);"

Do you mean a compile-time warning? The trouble is that the compiler wouldn't know whether a real widestring manager would get included in the final binary when such conversions are encountered. And remember that the final binary may be compiled at a different time from the moment when the unit containing such conversions is compiled. In other words, compile-time warnings would be rather difficult to implement. It might be possible to error-out at runtime when such conversions are invoked, but note that technically the conversion may not lead to incorrect results if the string doesn't contain characters beyond US-ASCII. In other word, a run-time error might be appropriate only if the conversion encounters a character it cannot handle. However, adding such a check would probably slow-down processing even for cases when the strings don't contain any problematic characters.


2) Conversion where at least one string is not explicitly declared for
a certain codepage.
   This should include indirection via $codepage

No, this is not the case. $codepage defines the source file encoding. The compiler translates the string constants declared this way to a UTF-16 constant stored within the compiled binary. Specifying $codepage has no implications on runtime conversions by itself.


Then maybe as a first step, a note/warning could be given, if a
constant string is assigned to a variable, and a change of encoding is
needed for this.
- "constant string" here would be any string that does not have a
direct explicit declared encoding.
- This could be given, even if the presence/absence of a widestring
manager is not known. Because

Because what?


Obviously knowing the presence/absence of a widestring manager allows
to refine warnings.
But I guess that comes at a higher price, as each unit when compiled
could only set flags in the ppu (including forwarding flags from used
units).
And the compiling the final program would read which warning flags are
present, and if any unit flagged the inclusion of a widestring
manager.

Yes, this would be indeed the only possibility.

Tomas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to