On Wed, January 18, 2012 17:32, Felipe Monteiro de Carvalho wrote: > 2012/1/18 Tomas Hajny <xhaj...@hajny.biz>: >> As pointed out in my other e-mail, "everywhere necessary" implies either >> "dear user, convert all your files from the original encoding before you >> want programs created in FPC to touch them" > > Yes, no problem here. I assume there must be some program in this > platform which can edit ASCII text and another one (or the same) to > convert text files between encodings. If not, just use the new port to > cross-compile such a program =)
I don't know the typical situation on S/370 machines nowadays, so the following comment may not be valid. However: how likely would you be to use some new program if any interchange of files between this program and whatever other software used on your machine would require you to convert these files back and forth manually all the time (considering that the existing software may be the main reason for still using this kind of platform at all)? >> or "dear programmer targetting >> OS/370, make sure that your programs are limited in what RTL functions >> you >> use, or convert all locally stored files to ASCII and only use the RTL >> functions for text processing on the converted copies". Otherwise even >> stuff like line by line reading or field by field reading of the input >> text file using standard RTL routines may not work as expected with the >> current RTL. > > I don't see why. A text encoding is just a text encoding, of the > hundreds of obsolete ones in existence, and the only sane way of > handling text in cross-platform applications is Unicode. My point is not about cross-platform applications here. My point is about applications running on zOS / OS/370 / ... natively (whether these applications should be cross-platform or not is the second step in my opinion). > The RTL could ship with UTF-8 <-> EBCDIC convertor and define UTF-8 as > the platform encoding. Detect which exact format the platform is using > at runtime if necessary and convert everywhere necessary. This should > cover all characters possibly imaginable and all control characters > too. > > What could go wrong here? This is what Java does in all its platforms. I don't know how it works for Java, but I know that it cannot work transparently in current FPC RTL without making at least some changes in the common parts (platform independent so-far). Most likely something similar (i.e. changes to otherwise platform independent RTL parts) happened to Java too when ported to S/370, or the run-time part design included such kind of considerations from the beginning (which I personally doubt ;-) ). > As for WriteLn / ReadLn if one really wants to allow inputing directly > control codes, one could either make them use UTF-8 and offer an > alternative RawWriteLn / RawReadLn for raw input of control codes or > leave them sending raw text and expose the routines to convert UTF-8 > <-> EBCDIC The point is not about the programmer interested in inputting the control codes directly (although that may be a valid scenario too if the programmer wants to work the same way he is used to on the other platforms), the point is about common parts of FPC RTL having e.g. hard-coded #9 as the tab character (which in turn controls how fields in text files may be separated from each other) or #10 as the line feed character and that this stuff happens at some point during transition from the generic "binary file I/O" to "text file processing" in common parts of the standard FPC RTL (as it stands now). BTW, even if one would consider e.g. translation from EBCDIC to UTF-8 in FileReadFunc (modified from its current standard platform independent implementation) because that is the place where generic files become to be interpreted as textual content in our implementation of standard Pascal RTL functions, it would fail in other routines in rtl/inc/text.inc due to difference between length read from the original file stored in EBCDIC and the size necessary in the text file buffer after the translation to UTF-8). Again - I'm sure multiple solutions exist, but I cannot imagine how it could work reasonably well without touching the common parts of the FPC RTL at least a bit (but unfortunately at multiple places which may be hard to find) except for a limited 'proof of concept' solution not meant to be used in standard ways (obviously, limiting all text file I/O to files encoded in ASCII or Unicode and not doing any console I/O if it cannot support ASCII or Unicode directly may be perfectly acceptable in such 'proof of concept' mode). Tomas _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel