Hi all,

this discussion is IMO whether unicode.* libraries are a replacement for string 
or not.

If they are a replacement for string, then they must preserve its semantics. 
For example string.find must be able to find bytes and return byte positions, 
because a string can also be binary data. I don't think that we can argue about 
this.

So the question is: are the unicode.* libraries meant as a drop in replacement 
for string? So that one can say for example:

if input=="utf8" then
  string = unicode.utf8
elseif input=="latin1" then
  string = uniocde.latin1
end

result = string.whatever()

When I look at the source code of the selene library, it seems to me perfectly 
clear that is meant as a drop in replacement. 

a) It covers exactly the same functions as string.*
b) The only changes are the extended character classes and the counting of 
character lengths when there is a non-byte operation (for example string.len() 
vs. #str)
c) everything else behaves exactly like strings.
d) it even mentions that it can be used as a replacement

So if it is a replacement, changing the find function would break everything 
that deals with binary data. Please let's not easily call the unicode library 
broken, because it is a design decision that has been made and for me it makes 
perfectly sense. And with the combination of *.len and *.sub as I have shown in 
a previous mail, everything that has been requested so far can be made.


And yes, if it is *not* meant as a replacement, than I can understand that this 
opens questions. But then find should not allow bytes in the pattern and should 
raise an error.

Patrick

(trying to avoid a heated discussion)

Reply via email to