Le 29/09/2018 à 09:03, Christophe JAILLET a écrit :
Hi,

There is a bug in the Xalan, the XSLT 1.0 engine we are using, that prevents our doc building tool chain to generate correct documentation.

By not correct, I mean:
   - ISO-8859-1 non ASCII characters are replaced again by their HTML entities equivalent (in the French doc for example)
   - breaks the man pages generation

A possible workaround is to use UTF-8 instead of ISO-8859-1.
The only drawback I see, is that some html files will be slightly bigger and a bit less readable because of the use of entities. This is not a that big problem.

However, Xalan looks mostly unmaintained since about 10 years. Xalan-Java 2.7.1, the one we are using, was released in November 2007. The latest release is Xalan-Java 2.7.2 on released in April 2014. This is mainly a maintenance release which fixes a CVE and a few bugs.


As per Xalan documentation, the JDK or JRE 1.3.x (2000), 1.4.x (2002), or 5.x (2004) is required. As per our doc build documentation, we need at least Java 1.2 (1998) to build the doc.
Java 8 (2014) is LTS and is supported until 2025
Java 9 (2017)
Java 10 (2018)
Java 11 (2018) is apparently LTS


Another XSLT engine could also be used. I've tried the latest Saxon 9.8.0.14 (well, a 9.9.0.1 was released 2 days ago, but there is no details yet)
My first results are:
   - a "build.sh all" takes ~3 min, instead of ~2 min with Xalan
   - generated files looks just fine
   - it removes some spaces only XML nodes. The generated code is slightly smaller.      The generated code could be slightly less readable in some cases. I've not seen
     any issue in the rendering of the pages without these few spaces
   - Saxon is XSLT 3.0. This could be used to simplify our xsl files.
     However, I've looked at the 2.0 and 3.0 changes, and I'm not sure we could have      a real use of it. Maybe dynamic Xpath, more built-in functions available or
     Text Value Templates?
     Not sure either that we need to upgrades the rules at all. It already works great.
   - this is a drop-in replacement. We just need to replace 2 jar files
     by a new one. That's all
   - We only need yo change a <func:function to a <xsl:function and the doc build
     out of the box
   - as per Saxon doc, it require Java 5+, 6+ or 8+ depending of the version we take


So, now is your turn to give your feeling about it:

Do we need to change something?
==============================
[ ] this mail is too long, do whatever you want, I just want something that works
[ ] no. I can leave with the current tool chain
[x] yes. Let clean some dust and update what is needed


What version of XSLT is best for us?
===================================

[ ] 1.0 - this is what I'm used to, keep things stable
[ ] 2.0
[x] 3.0 - the later the better, and/or the new functionalities rock!


Should we change our XSLT engine?
================================
[ ] No, I love Xalan and it is ASF. Just move to UTF-8 everywhere.
[x] Yes and Saxon is a good candidate. The license of the Home Edition
    is Mozilla Public License version 2.0.
[ ] Yes and ______ should be used instead


What is the oldest version of Java we should support?
====================================================
[ ] 1.2 - what we claim now
[ ] 1.3 - what is needed required by Xalan 1.7.1
[ ] 1.4
[ ] 5.0 - what is required by Saxon 9.6
[ ] 6 - what is required by Saxon 9.7 and 9.8
[ ] 7
[ ] 8 - what is required by the latest Saxon 9.9
[ ] 9
[ ] 10
[ ] 11


Depending of the minimum Java requirement consensus, we could also wonder if:    - we still need jakarta-oro Regex parser (ASF, but retired since 2010-09-01). Regex in Java are considered stable since a long time now
     [ ] keep it
     [ ] Axe it

   - we need to upgrade Ant. (Latest is 1.10.5. Ant 1.9.*: JDK 1.5+, Ant 1.8.*: JDK 1.4+, Ant 1.7.*: JDK 1.3+, Ant 1.6.*: JDK 1.2+)
     [ ] keep 1.6.5, we don't need to change
     [ ] 1.9.x, recent enough, still maintained, but not the latest. Should be the more stable      [ ] 1.10.x, the later the better, and/or the new functionalities rock!

   - any other topic?
In addition to XSLT version and XSLT engine updates, why not use UTF-8 instead of ISO-8859-1 ? It works for all languages while ISO-8859-1 is only for occidental languages.

UTF-8 html files are just a bit longer as ISO-8859-1 : for example bind.html.fr is 1% longer in UTF-8 than in ISO-8859-1 About readability, there's for me no difference between UTF-8 and ISO-8859-1 files.
Last, it solves the two problems :
--- man pages are correctly encoded
--- html files are generated without HTML entities.

I'll try this weekend to rebuild french doc in UTF-8


Thanks for your feedback,
CJ


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscr...@httpd.apache.org
For additional commands, e-mail: docs-h...@httpd.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscr...@httpd.apache.org
For additional commands, e-mail: docs-h...@httpd.apache.org

Reply via email to