Hello Nicolas, * Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):
> Mathias Bauer <mba...@gmx.org> writes: > > > I just stumbled over Org's plain text export and how it works on > > links with descriptions consisting of multiple words and line > > breaks between them. I'm running Org stable version 8.2.5h. > > > > Org source (spaces at the end of line 1 and 2 don't matter): > > > > --------------------snip-------------------- > > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC > > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC > > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... > > ... > > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar > > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo > > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz > > --------------------snip-------------------- > > > > Text export result: > > > > --------------------snip-------------------- > > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC > > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz > > > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > > > [RFC 2440] https://tools.ietf.org/html/rfc2440 > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > --------------------snip-------------------- > > > > These multiple references look quite bad. Is it possible to > > "normalize" the descriptions in some way *before* checking > > them for uniqueness and output them thereafter? > > Could you be more explicit? What does look quite bad? What did > you expect instead? How is related to line breaks in the > descriptions? Ok, let's go into more details. See the Org source text: 1. There are three links and each of them appears twice. The link targets of every two of them are identical. 2. Each of the two "[...][RFC 2440]" links appear in one line; the links "[...][RFC 4880]" and "[...][RFC 1991]" each have a newline in their description. They are in fact "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively, "[...][RFC\n1991]" and "[...][RFC 1991]". So, now let's examine the Org text export: The final reference part - the five links below the paragraph - shows two links, [RFC 4880] and [RFC 1991], which appear twice but the link [RFC 2440] appears only once there. This is, at least, inconsistent. The point is, that Org obviously considers "[...][RFC 4880]" and "[...][RFC\n4880]" as being two different links internally and list both of them in the reference part. For this listing, the \n is removed. This is, what I called "normalization" in my first post. Human eyes, however, won't see any difference between this two forms and start being surprised. I expect, Org to do the following steps while parsing the source text: 1. "Normalize" or clean the link description, i.e. remove any newlines, starting and trailing spaces, and replace any occurrences of "[ \t]+" in the interior by a single space only. (To be done.) 2. Check the tuple (description,target) for duplicates and drop them. (Seems ok to me.) 3. Below the paragraph list the tuples as "[description] target" in the order of occurrence in the original text. (Also seems ok to me.) I hope this makes this issue a little bit more clear now. Kind regards, Mathias