Follow-up Comment #27, bug#63074 (group groff):

[comment #25 comment #25:]
> I hear your expression of urgency but I don't think "stringhex" is good
long-term solution to what ails us.  You are correct in comment #22 that I did
not correctly apprehend at first what it was for.  I thought you developed it
because we had no way to reliably transmit arbitrary byte sequences to device
control commands.  But we did, sort of--it just needed to be made consistent
and reliable.  That it wasn't is what my test case attempts to illustrate and
what the fix to bug #64484 attempts to prove.

Yes, I want .device to pass everything exactly "as-is" just as it does now,
all escapes left untouched. I don't understand your opposition to stringhex.
If I remember correctly you didn't like "shouty" capitals in .SH/Sh and people
generally agreed, so you added .stringup/down to enforce the case change -
fair enough. Similarly people on the list have asked for pdf support for
non-latin languages, and to support all the features currently supported for
latin languages I need stringhex or a suitable alternative. So far none of
your suggestions will solve the problem, which means that perhaps I have not
explained it well enough. Let's look at a real example in the file mom-pdf.mom
in the mom/examples directory, it contains:-

.HEADING 2 NAMED internal "Creating internal links"
....
.PP
Furthermore, \*[cod]NAMED\|<id>\*[codx] stores the text of the
heading for use later on when linking to it (see
.PDF_LINK internal SUFFIX ). +
If headings are being numbered, the heading number is prepended.

Which produces what is shown in the attached png file. However, if this was
written by a greek person (or google translate!) it might look more like:-

.HEADING 2 NAMED εσωτερικό "Δημιουργία εσωτερικών
συνδέσεων"
....
.PP
Επιπλέον, \*[cod]NAMED\|<id>\*[codx] αποθηκεύει το
κείμενο του επικεφαλίδα
για χρήση αργότερα κατά τη σύνδεση με
αυτήν (βλ
.PDF_LINK εσωτερικό SUFFIX ). +
Εάν οι επικεφαλίδες αριθμούνται,
προηγείται ο αριθμός της επικεφαλίδας.

In either case the string at the end of the HEADING line has to be available
(using the .PDF_LINK key) so that the "+" symbol can be replaced by the given
text. This means that whatever is given as the NAMED <id> has to be used as
part of the array which holds the heading strings and for text converted to
\[uXXXX] form this would not be a legal identifier name, hence the need to
munge it into something acceptable which can be reconstituted.

Your example using \A'' is not helpful since it would reject any non-latin
language being used as a NAMED <id>.

> No, I accept your premise that the main driver behind "stringhex" was this:
>
> > The problem lies in the original pdfmark API, if you look at the
pdfmark.pdf you will see that in the sections describing .pdfhref M and
.pdfhref L which both refer to a "dest-name" and "descriptive text", it says
that if a dest-name is not given the first word in the description is used as
the dest-name.
>
> I appreciate your explanation.  If the problem was with the pdfmark API,
then let's fix the pdfmark API.
>
> In particular, this:
>
> > if a dest-name is not given the first word in the description is used as
the dest-name
>
> ...strikes me a short-sighted, especially without any validation going on. 
A textual description of a hyperlink/bookmark might contain all sorts of crazy
stuff.  (Like Cyrillic or CJK characters or, worse, motion or type-size or
font-selection escape sequences.)

The pdfmark API was around well before I wrote gropdf, so I just had to
support it.

What about embedded calls to macros (i.e. \*[macro arg]) I deal with that as
well. Here's an example from mom-pdf.mom again:-

.AUTHOR "Deri James" "\*[UP .5p]and" "Peter Schaffter"

Mom's AUTHOR macro predates pdf integration so it made sense to just add:-

.pdfinfo /Author \\$*

(or similar) to the macro. Which produces in the grout file:-

x X ps:exec [/Author (Deri James, \v'-.5p'and, Peter Schaffter,) /DOCINFO
pdfmark

(using the pdf.tmac on my branch) and if you look at the finished pdf you will
see a pristine author attribution in the document properties.

>Assuming that it was going to be a well-behaved sequence of ASCII bytes or
even that one could "sanitize" or "cln" one's way through was a hopeless
notion.  That won't be practical until we have a string iterator and more
conditional expressions that enable the user of an iterator to identify the
type of each item in an iterated string/macro/diversion.  But if I understand
you correctly, we don't need that fancy new stuff to solve the present
problem, with stringhex or without.

No such assumption is made, I expect the data to be dirty, I have removed
an*cln, asciify and never used sanitize. Stringhex is to make the NAMED <id>
acceptable to be used as a string name. I need none of your "fancy new
stuff".

> It would probably benefit me to look up Peter's documentation on _mom_'s
"HEADING" macro.  It is a bit baffling to me that one has to repeat arguments
like this:
>

> .HEADING 1 NAMED Гуляйпольщина "Гуляйпольщина"
> ...
> .PDF_LINK Гуляйпольщина PREFIX ( SUFFIX ) "see: +"


This is a bad example. Perhaps the real life examples above will help.

> > Where the "+" is replaced by the contents of the string register
pdf:look(Гуляйпольщина), which would actually be a string of
\[uXXXX] nodes, so would generate an error. This is what stringhex is for, to
hide the contents so that groff does not see it as a sequence of nodes. The
ideal solution would be to allow string registers to have an attribute (say
"glass") which signals that groff should never try to interpret its contents,
i.e. operate as if the escape mechanism was turned off just for the contents
of that register, and have a way of turning that attribute on/off or an escape
which sets the attribute for the enclosed string.
>
> Right now I don't understand why we would need to elaborate a fairly
fundamental *roff language data type (the string) with a "glass" attribute
when, if you have a list indexed by a number or a _valid_ identifier, you can
simply define a string using a list item's index as a prefix.

A numeric id is no good since you don't know what refno has been assigned to
each destination target, unless you count the HEADINGs and PDF_TARGETs from
the top of the document and insert that number in a PDF_LINK. And, as I hoped
you have grasped by now, it would be a bit rich to insist users must use
english for their named destinations, simply to ensure a valid identifier.


> .nr refno 1
> .de DEFREF
> .  nr refno +1
> .  ds ref*id!\n[refno]!tag \\$1
> .  ds ref*id!\n[refno]!author \\$2
> .  ds ref*id!\n[refno]!desc \\$3
> .  ds ref*id!\n[refno]!year \\$4
> ..
> .DEFREF story "Dupr\[e aa]" "Best \%Story\%Book Ever" 1989


>
> That's a simplified example of how macro packages have been implementing
arrays of data structures for decades, complete with idioms for "*" and "!",
which are not imposed by the language in any way.  Maybe I'm missing
something.

Yes. The fact that users may wish to use their own language for target
destinations.

> As it happens, this bug is probably fixed, too--I simply need to come up
with a convincing acceptance criterion for it.  A bit tough without adding a
feature to an existing output driver.  I trust it's obvious that, with
appropriate escaping, one can transmit "\000\001..\377" or "\x00..\xff" or
"\[u0000]..\[u00FF]".

Transmit! Where!! Oh, you mean this bug, about the pdf outline not capturing
cyrillic text, which was fixed in my branch 6 months ago, what feature and
output driver are you talking about?

> I will try to make some time to reply to comment #22 more thoughtfully soon.
 Leaving in "Need Info" status and assigned to myself for that reason.


(file #55576)

    _______________________________________________________

Additional Item Attachment:

File name: internal.png                   Size:16 KB
    <https://file.savannah.gnu.org/file/internal.png?file_id=55576>


    AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-3f5b69a3b837951a0e5c0b7730ee347c798a8844.tar.gz


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63074>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/


Reply via email to