AW: Document encoding

Epperlein, Lutz (agendo) via 4D_Tech Mon, 13 Jan 2020 02:44:59 -0800

I think you should first think about what encoding means, in general file 
encoding means the encoding of text files. If you use your code below with pdf 
files you destroy the pdf in a very reliable way, since pdf files are binary 
files. The same is with nearly all picture formats (and with Word and Excel 
files).
But binary files can contain section with text, this text has to be encoded in 
a certain way. To know which sections are to be encoded, you have to consult 
the documentation of the file formats. But usually tools which generate such 
files have options to do this setting by the user.


Regarding the second part of the question, how to detect the current encoding:
This can be a bit cumbersome, since plain text files doesn't have a marker 
which encoding is used. Sometimes there is a so-called BOM in the first bytes 
of the file if it is in Unicode encoding, but you can't rely on it. 
4D can help a bit: If you try to read a text file with the wrong encoding (and 
it contains bytes that can't be decoded) you will get an empty result. But it 
could be possible that the content is decoded in a wrong way and you don't get 
the right result.

Regards 
Lutz


-----Ursprüngliche Nachricht-----
Von: 4D_Tech [mailto:[email protected]] Im Auftrag von Two Way 
Communications via 4D_Tech
Betreff: Document encoding

Hi All,

An important customer of mine has requested that all documents, sent to him, 
are UTF-8 encoded.
This concerns PDF files, text files, Word, Excel, picture files.

I did some tests, but can’t figure out how to do that.

If, e.g., I look at a pdf file in BBEdit, it says ‘Mac Roman’.

Then I tried to open that file in 4D (v17, UTF-8) with document to blob
then:

DOCUMENT TO BLOB(document;$blob)                                
$DocBlobtxt:=Convert to text(blob;2027)  // 2027 = MacOS Roman  
TEXT TO BLOB($DocBlobtxt;$docblobUTF8;UTF8 text without length)

It seems to do that correctly, but then, this file cannot be opened in preview 
( Opens, but content is blanc)

The other thing is that I need to know the encoding of the file before using 
‘Convert to text’. That is not always possible.

Is this request feasible to start with?

Any ideas how to accomplish that?               


Regards,

Rudy Mortier
Two Way Communications bvba 

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

AW: Document encoding

Reply via email to