Hi all,

For a CMS I'm building I need to filter the input for a 'page path' input field. This input field allows the CMS user to assign custom SEO friendly uri (segments) to the page.

I'm thinking about creating my own filter for this. But before I do I wanted to consult this group.

What I usually do with this type of thing is, replace all diacritical characters with their non diacritical equivalent and special characters with their expanded counterparts. This is what I used to use (found some time ago on the internet):

// ">>" indicates break for readability
public function replaceDiacriticalChars( $string )
{

 $string = strtr(
  $string,
  "\xA1\xAA\xBA\xBF\xC0\xC1\xC2\xC3\xC5\xC7\xC8\xC9\xCA\xCB\xCC >>
  \xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD8\xD9\xDA\xDB\xDD\xE0\xE1 >>
  \xE2\xE3\xE5\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3 >>
  \xF4\xF5\xF8\xF9\xFA\xFB\xFD\xFF",
  "!ao?AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiidnooooouuuyy"
 );

 $string = strtr(
  $string,
  array(
   "\xC4" => "Ae",
   "\xC6" => "AE",
   "\xD6" => "Oe",
   "\xDC" => "Ue",
   "\xDE" => "TH",
   "\xDF" => "ss",
   "\xE4" => "ae",
   "\xE6" => "ae",
   "\xF6" => "oe",
   "\xFC" => "ue",
   "\xFE" => "th"
  )
 );

 return $string;

}

This does things like:

input: éèïß
output: eeiss

This does the job pretty good. I usually replace spaces, underscores and plus signs with dashes too, etc. But I was wondering:

1. What are the things I should or should not be concirned about when creating valid uri's? I mean, is this still a concirn, or do uri's allow a much broader spectrum of characters nowadays?

2. Is there already some component or filter in ZF that can create these kind of normalized uri's from some input?

Thank you in advance for any insights.

Reply via email to