On 2014-03-09 14:02:48 +0100, Christoph Biedl wrote:
> Vincent Lefevre wrote...
> 
> > On a LaTeX file, one currently gets:
> > 
> >   LaTeX 2e document text
> > 
> > It would be useful to have the encoding too, e.g.
> > 
> >   ISO-8859-1 LaTeX 2e document text
> >   UTF-8 LaTeX 2e document text
> (...)
> 
> From wheezy (5.11) on, file also prints a file encoding, like
> 
> | LaTeX 2e document, UTF-8 Unicode text
> 
> That one is guessed from the file content, not by eximation of
> statements like 'inputenc'. Is that sufficient for you?

Yes, more or less. The problem is for ISO-8859 files: one doesn't
know which version of ISO-8859 it is. I only use the ISO-8859-1
version, so that this is unambiguous for me, but this can be a
problem for filters based on "file" output that are distributed
widely.

> > On LaTeX files, the encoding can be obtained unambiguously (well,
> > in practice) by looking at \usepackage[...]{inputenc} commands,
> > e.g.
> > 
> >   \usepackage[latin1]{inputenc}
> >   \usepackage[utf8]{inputenc}
> 
> Seems feasible but still requires some hackery using regular
> expressions.

I think that in most cases, these commands occur at the beginning
of a line (looking for such a command would be useful only in the
ISO-8859 case, to differentiate the various versions).

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to