Re: About using ascii-ids attribute

Lex Trotman Fri, 18 Apr 2014 06:23:55 -0700

On 18 April 2014 20:38, Stéphane Gourichon <[email protected]> wrote:
> Le 18/04/2014 11:17, Lex Trotman a écrit :
>
>
>>
>> [my_chapter_title]
>> [[my_chapter_title]] or if you are like me [[mt]] :)
>
>
> That runs a high risk of collision. :-)


True, but not an auto generated collision :)

[...]
>
> The real nasty hack is having humans write *manual* (therefore fixed)
> references to ids that are *generated* depending on a *particular*
> implementation.
>
> That is the nasty hack, not the choice of generating ascii or unicode ids.

Indeed, but then its something that I have seen done, and have even
done myself in the past.

For English (read ASCII) titles the algorithm degenerates to "_
lowercase and replace spaces with _" so its not too hard (and for
programmers documenting their code it even looks suitably
identifier-ish :)  Incidently I'm not sure why it converts to
lowercase.

>
>
> Remember the difference between interface and implementation. Documents
> should obey the asciidoc interface (syntax, etc). The id generation
> algorithm is not part of the interface. The generated ids could be UTF-8,
> ROT13, MD5, chapter text translated to klingon through an HTTP call,
> whatever, even with randomly changing parts between runs, the document
> should not rely on that implementation detail and compile in all cases.
>
>
> Now that we've realized that manually linking to generated ids is broken
> anyway and should not be done, we can think correctly about the real value
> of generated ids.
>
> It is okay to use generated ids, in generated links only.
>
> It happens that full unicode ids are not portable, ascii ids are portable.

To be clear, using Unicode ids is correct, thats what the Docbook
standard specifies, its a *bug* in the toolchain that it does not
accept them.  That particular toolchain does *not* comply with the
Docbook standard.  Other toolchains do.

>
> And by "portable" I mean already with asciidoc implementation, not even with
> another implementation like asciidoctor.
>
> So, when using generated ids correctly (that is, in automatic links only),
> ascii ids are safer and have no drawback.
>
> To summarize :
> * writing manual link to generated ids and hope it will work: nasty hack --
> broken in theory, works in some cases in practice but fragile.

But unfortunately used in existing documents.

> * generating non-ascii ids (whether they are used or not): legit in theory,
> in practice breaks even simple documents with some backends. That's what
> "non-portable" means.

No, Asciidoc does not break anything, the toolchain breaks it, go bang
on their door to fix it.

> * generating ascii ids: legit in theory, safe in all practical cases.
>
> Is my point clear about where the nasty hack is and is not ?
>

Its a nasty hack if something like Asciidoc has to generate restricted
output because the toolchain is broken.

>
>
>
>>
>>> For that reason, making an
>>> explicit id ensures identity and is the right thing to do.
>>
>> Yes, but boring and messy.
>
>
> Yes : agreed. Boring ? I disagree. Messy ? On the contrary, relying on
> generated ids is messy.

By messy I mean that it adds visual clutter to the document that is
purely for the purpose of implementation details.  Not that I have a
solution, there has to be a way of disambiguating between several
occurrences of the same title.  But the fewer such artefacts the
better.  We are writing documents, not programs.

>
>
>>
>> It will break existing documents that refer to autogenerated ids.
>> Having archived documents fail to generate properly is a bad thing :(
>
>
> Can you name some of those documents, are there some publicly on the
> Internet ?
>
> My opinion is: those documents, if they exist, *are* already broken. In
> theory they are a nasty hack, in practice they are fragile.

Sure, but thats not a reason to stop them building.

>
> They are broken like a source code that happens to compile on one compiler
> and be rejected by a more correct one which rejects ambiguous or fragile
> constructs.
>
> Worse, they are broken like a source code that happens to compile on one
> compiler and be rejected by the same compiler when a non-interface-breaking
> option of the compiler is enabled (that is, an option where a compliant
> input will get processed correctly with or without the option).
>
> Do they work with another asciidoc implementation ? I guess probably not.
>

They would if the titles are ASCII, but asciidoctor seems to just cut
all non-ascii characters.  It doesn't appear to use the rather complex
conversion asciidoc does.  In retrospect its probably right.

>
> Worse is that this asciidoc implementation chokes on perfectly legal
> documents when using default latex toolchain.

And that, as I keep saying, is the tex part of the Latex part of the
dblatex toolchain that is at fault.  It is not asciidoc.  As I have
said I understand that you can use an alternative tex implementation
such as xetex which is unicode compliant, and may allow unicode ids if
dblatex passes them through. But I'm not sure, and since alternative
toolchains work fine I'm not about to try it.

This is, in practice, very bad
> for users and a sign that asciidoc interface is good, but leaves room for
> other, more compliant, asciidoc implementations.

Yes, use the more compliant fop toolchain, it accepts all legal XML ids.

Just say --fop.

I would change it to be the default, but then documents built with the
standard command would look different, and again, breaking existing
documents is a bad thing.  Even if its only changing the look by
changing the default option on the a2x convenience wrapper.

This may still need to be done in the future anyway, dblatex has not
been updated for nearly two years and there has been no response to
any bug reports since then either.  But so long as it continues to be
reasonably available and reasonably usable its not necessary to
change.

Just for the record, asciidoctor provides fopub, an a2x like script to
generate PDF, but it supports fop only.  So maybe it won't hurt to
make dblatex the deprecated non-default in the future.

>
> After the change, in theory some documents that compiled okay would not
> compile. That's called a "breaking change". Breaking changes should be
> performed only after sanity checks. After validation they are the way to go.
>
>
> Back to your objection, it is mostly theoretical and defending bad practice.
>
> When asciidoc implementation is fixed it would be nice to include a breaking
> change explanation about those theoretical documents and instructions on how
> to fix them.

Documents have a long lifetime, preventing them from being built by
changing the tools and requiring them to be updated is a bad thing.
Even if they contain what today would be considered the equivalent of
using gotos :)

You are clearly thinking about new, current documents, that are
supported and updated, in which case its fine to require changes.

But, unlike code, documents are archived, and old documents are still
relevant, and preventing them from being built makes such archives
useless. (Like archiving documents in old Microsoft formats)

>
> If people object about the new code citing collections that can't be fixed,
> then asciidoc may consider compatibility with broken documents. For example
> with a compatibility flag. Users of those documents can stick to the old
> release until the flag is available.

Well, asciidoc has a compatibility flag, just don't define ascii-ids
:)  This is what existing documents do.  New documents can add
:ascii-ids: to the top of the file if they need to use the broken
toolchain.

>
> In all other situations, the first move (fix the implementation) was the
> only effort to do.
>
> To summarize :
>
> * documents manually linking to so changing an fragile generated ids is bad
> practice and makes non-portable, fragile documents

Agree, and should be recommended against.

> * it's a free software world, release early release often, software that
> helps people replaces software that is broken.

Thats a bad mantra for document software as I explained above.

Cheers
Lex

>
> Thinking about theory is good, but move on and see what the outside world
> decides.
>
> Regards,
>
>
> --
> Stéphane Gourichon
>
> --
> You received this message because you are subscribed to the Google Groups
> "asciidoc" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/asciidoc.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"asciidoc" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/asciidoc.
For more options, visit https://groups.google.com/d/optout.

Re: About using ascii-ids attribute

Reply via email to