Your message dated Sat, 08 Jul 2017 11:26:55 +0200
with message-id <5330.1499506015@tremalking>
has caused the report #866716,
regarding dblatex: Problem processing non-ascii title received from pandoc
to be marked as having been forwarded to the upstream software
author(s) "Benoit Guillon" <[email protected]>
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
866716: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866716
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Petter Reinholdtsen <[email protected]> wrote:
Hi Benoît,
I want you to inform about Debian dblatex bug report #866716 [1], which
is similar to report #856123 [2]:
> Hi. I am not sure if this is really a dblatex or a pandoc problem, but
> start with dblatex as that is the program outputting the error message.
>
> When I start with a markdown page with a non-ascii character in a title,
> dblatex refuses to process the docbook generated from the page using
> pandoc. The generated docbook file is UTF-8, as is the original
> markdown file. Here is an example file that demonstrate the problem:
pandoc (like asciidoc) tends to generate id attributes containing non
ascii characters, which are treated by dblatex in a way that the default
backend (pdflatex) can't handle. Using the xetex backend instead
doesn't expose these problems.
When giving the tex file [3] generated by dblatex from the docbook file
[4] a closer look, this line looks suspicious:
\label{ártica}\hyperlabel{ártica}%
The docbook file is encoded with UTF-8, the tex file is encoded with
ISO-8859, however in this line character "Á" did not get
re-encoded. After manually fixing the line to
\label{Ártica}\hyperlabel{Ártica}%
pdflatex succeeds.
@petter:
Regarding the dblatex error message:
On transformation errors dblatex calls a debian specific error handler,
which knows about some typical error situations and tries to give hints
about resolving those. The first test is to check whether the input is
valid DocBook, as dblatex can't be expected to handle non DocBook files
properly and as violations of the DocBook structure may result in
invalid tex files, e.g. imagine a <table> element without children.
Unfortunately the xml output of your example is missing the dtd
declaration, thus validation fails. However in your case this is not
relevant: when using the xetex backend dblatex will succeed and not call
the error handler.
Unfortunately asciidoc and pandoc don't seem to include a dtd
declaration in their xml output, thus the dblatex error handler is not
well suited for those tools.
Another problem is BTS #867387 [5]: when prepending the following dtd
declaration to your example:
<?xml version="1.0"?>
<!DOCTYPE sect1
PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
validation with the following command succeeds:
$ xmllint --noout --valid bad-title.with-dtd.xml
$ echo $?
0
However dblatex handles possible xincludes and thus uses the following
command instead which exposes an xmllint bug:
$ xmllint --noout --postvalid --xinclude bad-title.with-dtd.xml
bad-title.with-dtd.xml:21: element sect2: validity error : Syntax of value for
attribute id of sect2 is not valid
Document bad-title.with-dtd.xml does not validate
$ echo $?
3
[1] https://bugs.debian.org/866716
[2] https://bugs.debian.org/856123
[3] % -----------------------------------------
% Autogenerated LaTeX file from XML DocBook
% -----------------------------------------
%%<params>
%% document.language en
%%</params>
\documentclass{article}
\usepackage{ifthen}
\newboolean{DBKIsBook}
\setboolean{DBKIsBook}{false}
\IfFileExists{ifxetex.sty}{%
\usepackage{ifxetex}%
}{%
\newif\ifxetex
\xetexfalse
}
\ifxetex
\usepackage{fontspec}
\usepackage{xltxtra}
\defaultfontfeatures{Mapping=tex-text}
\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}
\else
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\fi
\usepackage{fancybox}
\usepackage{makeidx}
\usepackage[hyperlink]{docbook}
\renewcommand{\DBKreleaseinfo}{}
\setcounter{tocdepth}{5}
\setcounter{secnumdepth}{5}
\title{Demonstrate pandoc/dblatex problem with markdown}
\author{}
\hypersetup{%
pdfcreator={DBLaTeX-0.3.9-3},%
pdftitle={Demonstrate pandoc/dblatex problem with markdown},%
pdfauthor={}}
% ------------------
% Collaborators
% ------------------
\renewcommand{\DBKindexation}{
\begin{DBKindtable}
\DBKinditem{\writtenby}{}
\end{DBKindtable}
}
\makeindex
\makeglossary
\begin{document}
\label{demonstrate-pandocdblatex-problem-with-markdown}\hyperlabel{demonstrate-pandocdblatex-problem-with-markdown}%
\lstsetup
\section{Demonstrate pandoc/dblatex problem with markdown}
\label{demonstrate-pandocdblatex-problem-with-markdown}\hyperlabel{demonstrate-pandocdblatex-problem-with-markdown}%
This markdown dokument demonstrate a problem with pandoc/dblatex.
To demonstrate the problem, process it like this:
\begin{lstlisting}[firstnumber=1,]
pandoc -t docbook -f markdown+inline_notes -o bad-title.xml bad-title.md
dblatex bad-title.xml
\end{lstlisting}
It is tested with pandoc version 0.3.5-{}2 and dblatex version
1.12.4.2\textasciitilde{}dfsg-{}1+b14.
\subsection{Ártica}
\label{ártica}\hyperlabel{ártica}%
The problem is the Á in Ártica, which is causing dblatex to reject
the XML file like this.
\begin{lstlisting}[firstnumber=1,escapeinside={<:}{:>}]
Build the book set list...
Build the listings...
XSLT stylesheets DocBook - LaTeX 2e (0.3.5-2)
===================================================
Warning: the root element is not an article nor a book
Warning: sect1(demonstrate-pandocmarkdown-problem) wrapped with article
Build bad-title.pdf
pdflatex failed
bad-title.aux:26: Missing \endcsname inserted.
bad-title.aux:26: leading text: ...rtica}{{1.1}{1}{<:\&\#xfffd;:>rtica}{subsection.1.1}{}}
A possible reason for transformation failure is invalid DocBook
(as reported by xmllint)
Error: pdflatex compilation failed
\end{lstlisting}
\end{document}
[4]
Title: Demonstrate pandoc/dblatex problem with markdown
This markdown dokument demonstrate a problem with pandoc/dblatex.
To demonstrate the problem, process it like this:
pandoc -t docbook -f markdown+inline_notes -o bad-title.xml bad-title.md
dblatex bad-title.xml
It is tested with pandoc version 0.3.5-2 and dblatex version
1.12.4.2~dfsg-1+b14.
Ártica
The problem is the Á in Ártica, which is causing dblatex to reject
the XML file like this.
Build the book set list...
Build the listings...
XSLT stylesheets DocBook - LaTeX 2e (0.3.5-2)
===================================================
Warning: the root element is not an article nor a book
Warning: sect1(demonstrate-pandocmarkdown-problem) wrapped with article
Build bad-title.pdf
pdflatex failed
bad-title.aux:26: Missing \endcsname inserted.
bad-title.aux:26: leading text: ...rtica}{{1.1}{1}{�rtica}{subsection.1.1}{}}
A possible reason for transformation failure is invalid DocBook
(as reported by xmllint)
Error: pdflatex compilation failed
[5] https://bugs.debian.org/867378
Regards, Andreas
--
Andreas Hoenen <[email protected]>
GPG: 1024D/B888D2CE
A4A6 E8B5 593A E89B 496B
82F0 728D 8B7E B888 D2CE
signature.asc
Description: PGP signature
--- End Message ---