Bill de hÓra schrieb:
+1
If you want characters from non-ASCII charsets, %-encode them.
> Slug: a-picture-of-my-house
I think it would be good if the example indeed would use characters not
in ISO8859.
I wouldn't.
Well, if the spec does all that hand waving about RFC2047, it would be
nice if it would also show a case where it's used.
Slug: Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir
This is missing the point, if we are talking about slugs. If you go and
look at what tools actually *do*, you'll see they strip down labels by
removing certain characters so they can dropped into URLs with no fuss
and with consistency. So if you send in this:
Slug: a-picture-of-my-house
most of the time, your URL is going to look like this:
*/a_picture_of_my_house
along with any other bits the slug code uses to create the link. Some
examples follow.
I do understand that the slug is just a suggestion. We still need a way
to get non-ASCII characters into the suggested text.
Here's the urlify.js from Django's admin console:
[[[
function URLify(s, num_chars) {
// changes, e.g., "Petty theft" to "petty_theft"
// remove all these words from the string before urlifying
removelist = ["a", "an", "as", "at", "before", "but", "by", "for",
"from",
"is", "in", "into", "like", "of", "off", "on", "onto",
"per",
"since", "than", "the", "this", "that", "to", "up",
"via",
"with"];
r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
s = s.replace(r, '');
s = s.replace(/[^-A-Z0-9\s]/gi, ''); // remove unneeded chars
s = s.replace(/^\s+|\s+$/g, ''); // trim leading/trailing spaces
s = s.replace(/[-\s]+/g, '-'); // convert spaces to hyphens
s = s.toLowerCase(); // convert to lowercase
return s.substring(0, num_chars);// trim to first num_chars chars
}
]]]
If you type my surname into a Plone site that uses rename_after_creation
to create a url slug for the page, it will drop Ó to o.
That may be a reasonable behavior for the server today, although I
certainly wouldn't want to spec it.
If you type my surname into a movable type blog, it will drop the Ó
altogether.
That one I'd call a bug.
If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable type
blog, it will come back with:
/les_fran-c3-a7ais_ont_gagn-c3-a9_hier_soir
Well, we're here to specify a good protocol, not to write down what some
implementations do today. If this spec would specify a URI segment,
servers clearly could un-percent-escape and UTF8-decode, then extract
words, then assign the URI.
If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable type
blog, it will come back with:
http://www.dehora.net/journal/2006/07/les_franc3a7ais_ont_gagnc3a9_hier_soir.html
Again, that's because it doesn't expect a URI fragment. That doesn't
mean that using one would be bad, it just means that that server's
implementation would need to reflect what the spec says.
I don't have a copy of wordpress handy to see what it does.
-1 if we're going to redefine/expand what slugging actually means or
inject new requirements on tools by way of spec riders.
A better example would be:
Slug: A Picture of my House
and let the server sort it out. Anything beyond that, and we should
re-open URL templates.
I'm fine with not using URI syntax here, but at the end of the day, the
spec should not only say how to do non-ISO8859 characters, but also
present a matching example.
Best regards, Julian