Hi all, this discussion is IMO whether unicode.* libraries are a replacement for string or not.
If they are a replacement for string, then they must preserve its semantics. For example string.find must be able to find bytes and return byte positions, because a string can also be binary data. I don't think that we can argue about this. So the question is: are the unicode.* libraries meant as a drop in replacement for string? So that one can say for example: if input=="utf8" then string = unicode.utf8 elseif input=="latin1" then string = uniocde.latin1 end result = string.whatever() When I look at the source code of the selene library, it seems to me perfectly clear that is meant as a drop in replacement. a) It covers exactly the same functions as string.* b) The only changes are the extended character classes and the counting of character lengths when there is a non-byte operation (for example string.len() vs. #str) c) everything else behaves exactly like strings. d) it even mentions that it can be used as a replacement So if it is a replacement, changing the find function would break everything that deals with binary data. Please let's not easily call the unicode library broken, because it is a design decision that has been made and for me it makes perfectly sense. And with the combination of *.len and *.sub as I have shown in a previous mail, everything that has been requested so far can be made. And yes, if it is *not* meant as a replacement, than I can understand that this opens questions. But then find should not allow bytes in the pattern and should raise an error. Patrick (trying to avoid a heated discussion)
