On Tuesday 26 September 2017 14:20:33 Night Light wrote:
> That's a nifty function. Good to know that it can be reversed.
UTF-8 encode is a function which for any number from the range
0..1114111 assign unique sequence of the numbers 0..255.
Therefore this function has a well defined inverse - UTF-8 decode
function.
As a sequence of numbers from the range 0..1114111 via UTF-8 encode
function produce sequence of the numbers in range 0..255 (length of
sequence would be larger) it can be again used as as input for the UTF-8
encode function.
And because output from the UTF-8 encode has well defined inverse, you
can easily reconstruct also inverse of the composition of the more UTF-8
functions.
Take string $str and following pass:
decode('UTF-8', decode('UTF-8', encode('UTF-8', encode('UTF-8', $str)))) eq
$str;
To have exactly correct result, you just need to know how many times you
composed repeated call to UTF-8 encode function.