Max Nikulin <[email protected]> writes: >> +*** Trailing =-= is now allowed in plain links > > After a look into > > 7dcb1afb6 2021-03-24 21:27:24 +0800 Ihor Radchenko: Improve > org-link-plain-re > > I suspect, it worked prior to v9.5. Without a unit test it may be > accidentally broken again.
No, it did not work. If you can, please do not make such assertions without testing. >> +: https://domain/test- > > example.org, example.net, example.com are domains reserved for usage in > examples: > <https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml> And so? >> (or (regexp "[^[:punct:] \t\n]") > > I have realized that some Org regexps use [:punct:] *regexp class* and > others *syntax class*, see latex math regexp. I am in doubts if the > discrepancy is intentional. It is not intentional, but using syntax classes can sometimes be fragile. > I have noticed that the following change > > 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re: > Improve regexp heuristics > > that causes > > (link http://example.org/a<b) > > input is exported as > > <p> > (link <a > href="http://example.org/a%3Cb)">http://example.org/a%3Cb)</a></p> > > I expect that ")" should not be parsed as a part of the link. Balanced > brackets are tricky with regexps (and it is not possible to match > arbitrary nested ones). It is heuristics. We cannot be 100% right. So, it is what it is. > Perhaps "[^[:punct:] \t\n]" is too strict in respect to spaces. It does > not allow the recommended workaround with zero width space: You don't need zero width space for links. Just use <bracket link>. > As to the original bug report, while reading it, I noticed that > thunderbird includes dash into the recognized link for > > "https://domain/test-" > > I decided to look into its implementation and to my surprise I found: > ``punctation chars and "-" at the end are stipped off.'' I realized that > double quotes along with angle brackets are treated as a recommended way > to mark URLs in plain text. Thunderbird does not consider dash as a part > of links for e.g. http://example.org/t- It might be an attempt to > reserve possibility to assemble URLs wrapped into several lines with > added hyphenation marks, but it has not been implemented (RFC2396 > appendix E warns about accidentally added hyphens). > > https://www.bucksch.org/1/projects/mozilla/16507/ > https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#line-243 > mozTXTToHTMLConv::FindURLEnd > > Implementation is tricky, I have not noticed anything that may be reused > to improve heuristics for Org. Nowadays it is likely better to inspect > autolinking code for GitHub/GitLab or widely used python packages. If you have concrete proposals, please share them. > I would consider [:space:] or \s-. Do you mean "[^[:punct:][:space:]\t\n]"? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
