Bill de hÓra schrieb:
+1
If you want characters from non-ASCII charsets, %-encode them.

>   Slug: a-picture-of-my-house

I think it would be good if the example indeed would use characters not
in ISO8859.

I wouldn't.

Well, if the spec does all that hand waving about RFC2047, it would be nice if it would also show a case where it's used.

Slug: Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir

This is missing the point, if we are talking about slugs. If you go and look at what tools actually *do*, you'll see they strip down labels by removing certain characters so they can dropped into URLs with no fuss and with consistency. So if you send in this:

  Slug: a-picture-of-my-house

most of the time, your URL is going to look like this:

  */a_picture_of_my_house

along with any other bits the slug code uses to create the link. Some examples follow.

I do understand that the slug is just a suggestion. We still need a way to get non-ASCII characters into the suggested text.

Here's the urlify.js from Django's admin console:

[[[
function URLify(s, num_chars) {
    // changes, e.g., "Petty theft" to "petty_theft"
    // remove all these words from the string before urlifying
removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from", "is", "in", "into", "like", "of", "off", "on", "onto", "per", "since", "than", "the", "this", "that", "to", "up", "via",
                  "with"];
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-A-Z0-9\s]/gi, '');  // remove unneeded chars
    s = s.replace(/^\s+|\s+$/g, ''); // trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-');   // convert spaces to hyphens
    s = s.toLowerCase();             // convert to lowercase
    return s.substring(0, num_chars);// trim to first num_chars chars
}
]]]


If you type my surname into a Plone site that uses rename_after_creation to create a url slug for the page, it will drop Ó to o.

That may be a reasonable behavior for the server today, although I certainly wouldn't want to spec it.

If you type my surname into a movable type blog, it will drop the Ó altogether.

That one I'd call a bug.

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable type blog, it will come back with:

 /les_fran-c3-a7ais_ont_gagn-c3-a9_hier_soir

Well, we're here to specify a good protocol, not to write down what some implementations do today. If this spec would specify a URI segment, servers clearly could un-percent-escape and UTF8-decode, then extract words, then assign the URI.

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable type blog, it will come back with:

http://www.dehora.net/journal/2006/07/les_franc3a7ais_ont_gagnc3a9_hier_soir.html

Again, that's because it doesn't expect a URI fragment. That doesn't mean that using one would be bad, it just means that that server's implementation would need to reflect what the spec says.

I don't have a copy of wordpress handy to see what it does.

-1 if we're going to redefine/expand what slugging actually means or inject new requirements on tools by way of spec riders.

A better example would be:

   Slug: A Picture of my  House

and let the server sort it out. Anything beyond that, and we should re-open URL templates.

I'm fine with not using URI syntax here, but at the end of the day, the spec should not only say how to do non-ISO8859 characters, but also present a matching example.

Best regards, Julian

Reply via email to