At 03:46 PM 3/3/01 -0000, Tomas Frydrych wrote:
>Vlad wrote:
>> I think that the best solution is to save in utf8 or as numeric 
>> entities under latin1 and CJK locales (I don't know what to prefer 
>> here), and save in native encoding on other locales. 
>
>I agree with Vlad, this is not only a question of being able to edit 
>files in a plain text editor on non-latin1 locales, or of their size, but 
>also of being able to use utilities such as grep on these files. I 
>personally would prefer utf8 to the entities, because of the resulting 
>file size with the entities.

Does anyone else have a preference on this?  

For CJK, entitizing certainly seems ridiculous, but I'm no CJK expert.  

As for Latin-1, I suppose the closest corollary to the "native encoding" 
precedent would be to revert to Jeff's "entitize non-ASCII" solution, rather 
than go to full utf8.  I know that I personally have gotten totally used to 
the occasional entity here & there (in our file format as well as HMTL and 
others), so the bloat's not a factor for me.  Are other text-munging tools 
for Latin-1 more likely to cope well with utf8 or entitized text?  (Specific 
examples would help here.)

I guess my temptation would be to entitize Latin-1, but I don't have such a 
strong preference that I want to block consensus. 

Paul


Reply via email to