Le 29/09/2018 à 09:03, Christophe JAILLET a écrit :
Hi,
There is a bug in the Xalan, the XSLT 1.0 engine we are using, that
prevents our doc building tool chain to generate correct documentation.
By not correct, I mean:
- ISO-8859-1 non ASCII characters are replaced again by their HTML
entities equivalent (in the French doc for example)
- breaks the man pages generation
A possible workaround is to use UTF-8 instead of ISO-8859-1.
The only drawback I see, is that some html files will be slightly
bigger and a bit less readable because of the use of entities. This is
not a that big problem.
However, Xalan looks mostly unmaintained since about 10 years.
Xalan-Java 2.7.1, the one we are using, was released in November 2007.
The latest release is Xalan-Java 2.7.2 on released in April 2014. This
is mainly a maintenance release which fixes a CVE and a few bugs.
As per Xalan documentation, the JDK or JRE 1.3.x (2000), 1.4.x (2002),
or 5.x (2004) is required.
As per our doc build documentation, we need at least Java 1.2 (1998)
to build the doc.
Java 8 (2014) is LTS and is supported until 2025
Java 9 (2017)
Java 10 (2018)
Java 11 (2018) is apparently LTS
Another XSLT engine could also be used. I've tried the latest Saxon
9.8.0.14 (well, a 9.9.0.1 was released 2 days ago, but there is no
details yet)
My first results are:
- a "build.sh all" takes ~3 min, instead of ~2 min with Xalan
- generated files looks just fine
- it removes some spaces only XML nodes. The generated code is
slightly smaller.
The generated code could be slightly less readable in some cases.
I've not seen
any issue in the rendering of the pages without these few spaces
- Saxon is XSLT 3.0. This could be used to simplify our xsl files.
However, I've looked at the 2.0 and 3.0 changes, and I'm not sure
we could have
a real use of it. Maybe dynamic Xpath, more built-in functions
available or
Text Value Templates?
Not sure either that we need to upgrades the rules at all. It
already works great.
- this is a drop-in replacement. We just need to replace 2 jar files
by a new one. That's all
- We only need yo change a <func:function to a <xsl:function and
the doc build
out of the box
- as per Saxon doc, it require Java 5+, 6+ or 8+ depending of the
version we take
So, now is your turn to give your feeling about it:
Do we need to change something?
==============================
[ ] this mail is too long, do whatever you want, I just want something
that works
[ ] no. I can leave with the current tool chain
[x] yes. Let clean some dust and update what is needed
What version of XSLT is best for us?
===================================
[ ] 1.0 - this is what I'm used to, keep things stable
[ ] 2.0
[x] 3.0 - the later the better, and/or the new functionalities rock!
Should we change our XSLT engine?
================================
[ ] No, I love Xalan and it is ASF. Just move to UTF-8 everywhere.
[x] Yes and Saxon is a good candidate. The license of the Home Edition
is Mozilla Public License version 2.0.
[ ] Yes and ______ should be used instead
What is the oldest version of Java we should support?
====================================================
[ ] 1.2 - what we claim now
[ ] 1.3 - what is needed required by Xalan 1.7.1
[ ] 1.4
[ ] 5.0 - what is required by Saxon 9.6
[ ] 6 - what is required by Saxon 9.7 and 9.8
[ ] 7
[ ] 8 - what is required by the latest Saxon 9.9
[ ] 9
[ ] 10
[ ] 11
Depending of the minimum Java requirement consensus, we could also
wonder if:
- we still need jakarta-oro Regex parser (ASF, but retired since
2010-09-01). Regex in Java are considered stable since a long time now
[ ] keep it
[ ] Axe it
- we need to upgrade Ant. (Latest is 1.10.5. Ant 1.9.*: JDK 1.5+,
Ant 1.8.*: JDK 1.4+, Ant 1.7.*: JDK 1.3+, Ant 1.6.*: JDK 1.2+)
[ ] keep 1.6.5, we don't need to change
[ ] 1.9.x, recent enough, still maintained, but not the latest.
Should be the more stable
[ ] 1.10.x, the later the better, and/or the new functionalities
rock!
- any other topic?
In addition to XSLT version and XSLT engine updates, why not use UTF-8
instead of ISO-8859-1 ?
It works for all languages while ISO-8859-1 is only for occidental
languages.
UTF-8 html files are just a bit longer as ISO-8859-1 : for example
bind.html.fr is 1% longer in UTF-8 than in ISO-8859-1
About readability, there's for me no difference between UTF-8 and
ISO-8859-1 files.
Last, it solves the two problems :
--- man pages are correctly encoded
--- html files are generated without HTML entities.
I'll try this weekend to rebuild french doc in UTF-8
Thanks for your feedback,
CJ
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscr...@httpd.apache.org
For additional commands, e-mail: docs-h...@httpd.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscr...@httpd.apache.org
For additional commands, e-mail: docs-h...@httpd.apache.org