Re: PaceSlugHeader

Julian Reschke Thu, 06 Jul 2006 05:01:22 -0700


Bill de hÓra schrieb:

+1
If you want characters from non-ASCII charsets, %-encode them.

>   Slug: a-picture-of-my-house

I think it would be good if the example indeed would use characters not
in ISO8859.


I wouldn't.

Well, if the spec does all that hand waving about RFC2047, it would benice if it would also show a case where it's used.

Slug: Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir
This is missing the point, if we are talking about slugs. If you go andlook at what tools actually *do*, you'll see they strip down labels byremoving certain characters so they can dropped into URLs with no fussand with consistency. So if you send in this:
  Slug: a-picture-of-my-house

most of the time, your URL is going to look like this:

  */a_picture_of_my_house
along with any other bits the slug code uses to create the link. Someexamples follow.

I do understand that the slug is just a suggestion. We still need a wayto get non-ASCII characters into the suggested text.

Here's the urlify.js from Django's admin console:

[[[
function URLify(s, num_chars) {
    // changes, e.g., "Petty theft" to "petty_theft"
    // remove all these words from the string before urlifying
removelist = ["a", "an", "as", "at", "before", "but", "by", "for","from","is", "in", "into", "like", "of", "off", "on", "onto","per","since", "than", "the", "this", "that", "to", "up","via",
                  "with"];
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-A-Z0-9\s]/gi, '');  // remove unneeded chars
    s = s.replace(/^\s+|\s+$/g, ''); // trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-');   // convert spaces to hyphens
    s = s.toLowerCase();             // convert to lowercase
    return s.substring(0, num_chars);// trim to first num_chars chars
}
]]]
If you type my surname into a Plone site that uses rename_after_creationto create a url slug for the page, it will drop Ó to o.

That may be a reasonable behavior for the server today, although Icertainly wouldn't want to spec it.

If you type my surname into a movable type blog, it will drop the Óaltogether.


That one I'd call a bug.

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable typeblog, it will come back with:
 /les_fran-c3-a7ais_ont_gagn-c3-a9_hier_soir

Well, we're here to specify a good protocol, not to write down what someimplementations do today. If this spec would specify a URI segment,servers clearly could un-percent-escape and UTF8-decode, then extractwords, then assign the URI.

If type Les_Fran%C3%A7ais_ont_gagn%C3%A9_hier_soir into a movable typeblog, it will come back with:
http://www.dehora.net/journal/2006/07/les_franc3a7ais_ont_gagnc3a9_hier_soir.html

Again, that's because it doesn't expect a URI fragment. That doesn'tmean that using one would be bad, it just means that that server'simplementation would need to reflect what the spec says.

I don't have a copy of wordpress handy to see what it does.
-1 if we're going to redefine/expand what slugging actually means orinject new requirements on tools by way of spec riders.
A better example would be:

   Slug: A Picture of my  House
and let the server sort it out. Anything beyond that, and we shouldre-open URL templates.

I'm fine with not using URI syntax here, but at the end of the day, thespec should not only say how to do non-ISO8859 characters, but alsopresent a matching example.


Best regards, Julian

Re: PaceSlugHeader

Reply via email to