# Re: [NTG-context] modifying URL wrapping rules

On 11/18/2008 3:44 PM, Arthur Reutenauer wrote:
>> >From what I can tell, the .tex file loads one of the other three:
>
> on the ConTeXt version you're running (MkII / MkIV).  In Mark IV, the
> Lua code is then put in lang-url.lua, which is input by lang-url.mkiv
> (you can see "\registerctxluafile{lang-url}{1.001}" near the beginning
> of the latter).  This architecture enables you to reuse the Lua code in
> completely different environments (for example, in a pure Lua script).
>
>> Our project has a requirement of using Xetex, so I have to stick with
>> that. Does that mean lang-url doesn't work at all?
>
>   ConTeXt on XeTeX is considered Mark II as far as the mark business
> code as with pdfTeX; in this case, lang-url.mkii will be loaded. 

OK, I've taken a stab at it. Here is the main code now in the modified
lang-url.mkii. For brevity in this email I've just omitted the lines
that I actually commented out in the file, namely characters that
Chicago style does not say you can line-break URLs on.

\def\sethyphenatedurlnormal#1{\expandafter\chardef\csname url @
#1\endcsname\zerocount}
\def\sethyphenatedurlbefore#1{\expandafter\chardef\csname url @
#1\endcsname\plusone  }
\def\sethyphenatedurlafter #1{\expandafter\chardef\csname url @
#1\endcsname\plustwo  }

% Chicago manual of style rules:
% Break URLs after: / or // (I don't know how to implement // so will be
content with / for now.
%      To do: prevent breaking in middle of double slash //.)
% Break URLs before: ~ . , - _ ? # %
% Break URLs before or after: = & (I don't know how to implement 'before
or after' so will
%       be content with breaking 'before' these characters for now).
\sethyphenatedurlbefore \letterhash
\sethyphenatedurlbefore \letterpercent
\sethyphenatedurlbefore \letterampersand
\sethyphenatedurlbefore ,
\sethyphenatedurlbefore -
\sethyphenatedurlbefore .
\sethyphenatedurlbefore =
\sethyphenatedurlbefore ?
\sethyphenatedurlbefore _
\sethyphenatedurlbefore \lettertilde

\sethyphenatedurlafter / % was \sethyphenatedurlbefore /

However, I have a few unsolved problems here.

1) I don't see a way, with the '\sethyphenatedurlbefore' or 'after'
mechanism, to tell it not to break a URL between two slashes, as in
"http://";. At first I thought that since our text only had a few URLs,
we'd likely never care. But ... you guessed it. One URL got broken
between the slashes: "http:/
/www.sil.org/..."

So I tried using the base tex hyphenation mechanism to inhibit breaking
there: I changed the document from
\hyphenatedurl{http://www.sil.org/...}
to
\hyphenatedurl{\hyphenation{http://}www.sil.org/...}
but that gave a stack overflow.

Then I tried
\hyphenation{http://}\hyphenatedurl{www.sil.org/...}
but got this error:
! Not a letter.
<inserted text> http:
//
\hyphenation ...malhyphenation {\the \scratchtoks
}\endgroup
<argument> ... Linguistics. \hyphenation {http://}
\hyphenatedurl
{www.sil.or...

\BE #1->\startmainexdent {#1
}\stopmainexdent
l.317 ...l.org/silesr/abstract.asp?ref=2007-015}.}

I'm kind of shooting in the dark there, so maybe somebody who knows TeX
can help me out.

2) Even though I have "\sethyphenatedurlafter /" instead of
"\sethyphenatedurlbefore /", there are four cases where a URL is broken
before a slash, e.g.:
http://www.sil.org/.../009
/YAMBASSA.html.
and no cases where a URL is broken after a slash (except when it's also
before a slash -- see 1).

I wonder if my modifications are actually taking effect?
Do I need to compile the changes to the .mkii file or something? I tried
texexec.bat --make --all, but that didn't seem to change the outcome.

3) Conversely, even though I have "\sethyphenatedurlbefore -" and not
"\sethyphenatedurlafter -", there is a case where a URL is broken after
a hyphen (a hyphen that was already present in the URL):
http://www..../Niger-
Congo/...
and no case where a URL is broken before a hyphen.
Note that the "\sethyphenatedurlbefore -" setting is unchanged from the
original lang-url.mkii, so this is not an issue of needing to recompile.

Maybe the general tex hyphenation mechanism is operating here, in spite
of the URL breaking settings. How do I override that (only for the URL)?

4) In one case, a URL is broken over the end of a column. That's ok, but
it would be nice to be able to strongly discourage that from happening
at the end of a page. I'm told that's a difficult problem to solve. It's
not mandatory for us at this point but if anyone has a solution I'd like

Thanks,
Lars
___________________________________________________________________________________