[The Java Posse] Re: Dick, that's not how you compare strings!

Christian Catchpole Thu, 12 Aug 2010 03:35:22 -0700

They look like search algorithms (for things like databases), rather
than than unicode case insensitive comparisons (based on my quick look
at the site).


On Aug 12, 6:10 pm, Amarjeet Singh <[email protected]> wrote:
> And what on earth are these algorithms for string comparison then?
>
> http://www-igm.univ-mlv.fr/~lecroq/string/index.html
>
> Reg
>
>
>
>
>
> On Mon, Aug 9, 2010 at 10:29 AM, Dick Wall <[email protected]> wrote:
> > I can't help but feel that the discussion has got a little bit lost in
> > the rough :-). I do wish I had pulled a better example out for that
> > original post, but lest anyone not remember, the point was to show how
> > closures (and in particular good language support for them) greatly
> > cuts boilerplate and enhances readability. I could have used an
> > example with some genetic calculation code or something like that, but
> > it would have needed far more supporting material. Point is, Java
> > exhibits its own ugly backwaters of complexity, and they tend to be in
> > features we use all the time (like anonymous inner classes).
>
> > Dick
>
> > On Aug 8, 3:23 pm, Reinier Zwitserloot <[email protected]> wrote:
> >> So close.
>
> >> java's own String.CASE_INSENSITIVE_ORDER uses this tactic, and as far
> >> as case insensitive tactics go, this really isn't such a bad one.
> >> However, they completely bollocks it up by doing this character-by-
> >> character for some completely unfathomable reason. This is dumb, and
> >> explains why STRASSE and straße aren't equal.
> >> Character.toUpperCase('\u00DF') can't very well return "SS", so it has
> >> to return the unicode codepoint for capital eszett.
>
> >> Nevertheless, as someone else has pointed out to me, both großman and
> >> grossman are somewhat common german surnames and shouldn't be
> >> considered equal, so, in many ways, yes, 'case insensitive' as a
> >> concept doesn't really make sense beyond english.
>
> >> Doing a canonical comparison to answer the question: "Are these
> >> strings most likely intended to be equal considering that they are
> >> both written in language X", is completely valid though, and that's
> >> exactly what java.text.Collator is for. I don't think this is mission
> >> impossible. It's just crazy complicated.
>
> >> Many props to A McDowell for teaching us all about the case folding
> >> rules of unicode. I learned something new.
>
> >> On Aug 8, 9:34 am, Christian Catchpole <[email protected]>
> >> wrote:
>
> >> > So, without some kind of case translation dictionary that can be
> >> > trusted on the particular strings we want to test, can we assume
> >> > that's it's not actually a solvable problem? (because, like divide by
> >> > zero, the question isn't valid to start with)
>
> >> > Could you maybe get better results by (if upperCompare ||
> >> > lowerCompare)?
>
> >> > Was I serious for a second there?
>
> >> > GERBILS!
>
> >> > That's better.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "The Java Posse" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/javaposse?hl=en.
>
> --
> Amarjeet Singh
> Phone: +91-98712-76661

-- 
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en.

[The Java Posse] Re: Dick, that's not how you compare strings!

Reply via email to