Package: python3-lxml
Version: 4.6.3+dfsg-0.1
Severity: important
X-Debbugs-Cc: [email protected]
Dear Maintainer,
I ran into a bug that causes lxml to truncate output when using
"tostring" with encoding set to "utf8", while it works correctly when
encoding is set to "utf-8". See attached "bug.py" file with an example
to reproduce. The output under "Bad" has truncated text in the last
subfield.
I've previously reported this bug upstream in
https://bugs.launchpad.net/lxml/+bug/1944751 but further testing makes
me think that this is Debian specific: when running the attached
"bug.py" example in a new virtualenv in which I ran "pip install lxml",
and hence using the upstream binary wheel, the bug doesn't arise.
Best,
Micha
-- System Information:
Debian Release: 11.0
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 5.10.0-8-amd64 (SMP w/8 CPU threads)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8),
LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages python3-lxml depends on:
ii libc6 2.31-13
ii libxml2 2.9.10+dfsg-6.7
ii libxslt1.1 1.1.34-4
ii python3 3.9.2-3
Versions of packages python3-lxml recommends:
ii python3-bs4 4.9.3-1
ii python3-html5lib 1.1-3
Versions of packages python3-lxml suggests:
pn python-lxml-doc <none>
pn python3-lxml-dbg <none>
-- no debconf information
from lxml.builder import E
from lxml.etree import tostring
RECORD = E.record
CONTROLFIELD = E.controlfield
DATAFIELD = E.datafield
SUBFIELD = E.subfield
INPUT_DATA = {
"520": [
{
"9": "APS",
"a": 'The first measurement of the dependence of <math
display="inline"><mrow><mi>γ</mi><mi>γ</mi><mo
stretchy="false">→</mo><msup><mrow><mi>μ</mi></mrow><mrow><mo>+</mo></mrow></msup><msup><mrow><mi>μ</mi></mrow><mrow><mo>−</mo></mrow></msup></mrow></math>
production on the multiplicity of neutrons emitted very close to the beam
direction in ultraperipheral heavy ion collisions is reported. Data for
lead-lead interactions at <math
display="inline"><mrow><msqrt><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>N</mi><mi>N</mi></mrow></msub></mrow></msqrt><mo>=</mo><mn>5.02</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><mi>TeV</mi></mrow></math>,
with an integrated luminosity of approximately <math
display="inline"><mrow><mn>1.5</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><msup><mrow><mi>nb</mi></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup></mrow></math>,
are collected using the CMS detector at the LHC. The azimuthal correlations
between the two muons in the invariant mass region <math
display="inline"><mrow><mn>8</mn><mo><</mo><msub><mrow><mi>m</mi></mrow><mrow><mi>μ</mi><mi>μ</mi></mrow></msub><mo><</mo><mn>60</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><mi>GeV</mi></mrow></math>
are extracted for events including 0, 1, or at least 2 neutrons detected in
the forward pseudorapidity range <math display="inline"><mrow><mrow><mo
stretchy="false">|</mo><mi>η</mi><mo
stretchy="false">|</mo></mrow><mo>></mo><mn>8.3</mn></mrow></math>. The
back-to-back correlation structure from leading-order photon-photon scattering
is found to be significantly broader for events with a larger number of emitted
neutrons from each nucleus, corresponding to interactions with a smaller impact
parameter. This observation provides a data-driven demonstration that the
average transverse momentum of photons emitted from relativistic heavy ions has
an impact parameter dependence. These results provide new constraints on models
of photon-induced interactions in ultraperipheral collisions. They also provide
a baseline to search for possible final-state effects on lepton pairs caused by
traversing a quark-gluon plasma produced in hadronic heavy ion collisions.',
},
{
"9": "arXiv",
"a": "The first measurement of the dependence of
$\\gamma\\gamma$$\\to$$\\mu^{+}\\mu^{-}$ production on the multiplicity of
neutrons emitted very close to the beam direction in ultraperipheral heavy ion
collisions is reported. Data for lead-lead interactions at
$\\sqrt{s_\\mathrm{NN}} =$ 5.02 TeV, with an integrated luminosity of
approximately 1.5 nb$^{-1}$, were collected using the CMS detector at the LHC.
The azimuthal correlations between the two muons in the invariant mass region 8
$\\lt$$m_{\\mu\\mu}$$\\lt$ 60 GeV are extracted for events including 0, 1, or
at least 2 neutrons detected in the forward pseudorapidity range
$|\\eta|$$\\gt$ 8.3. The back-to-back correlation structure from leading-order
photon-photon scattering is found to be significantly broader for events with a
larger number of emitted neutrons from each nucleus, corresponding to
interactions with a smaller impact parameter. This observation provides a
data-driven demonstration that the average transverse momentum of photons
emitted from relativistic heavy ions has an impact parameter dependence. These
results provide new constraints on models of photon-induced interactions in
ultraperipheral collisions. They also provide a baseline to search for possible
final-state effects on lepton pairs caused by traversing a quark-gluon plasma
produced in hadronic heavy ion collisions.",
},
]
}
record = RECORD()
for tag, values in sorted(INPUT_DATA.items()):
for value in values:
datafield = DATAFIELD({"tag": tag, "ind1": " ", "ind2": " "})
for code, el in sorted(value.items()):
datafield.append(
SUBFIELD(el, {"code": code})
)
record.append(datafield)
utf8_bad = tostring(record, encoding="utf8")
utf8_good = tostring(record, encoding="utf-8")
print("Bad:", utf8_bad, "\n", "Good:", utf8_good, sep="\n")