2014/1/15 Oliver Heger <oliver.he...@oliver-heger.de> > > > Am 15.01.2014 15:05, schrieb Benedikt Ritter: > > 2014/1/15 Gary Gregory <garydgreg...@gmail.com> > > > >> On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <brit...@apache.org> > >> wrote: > >> > >>> Hi Gary, > >>> > >>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com> > >>> > >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <brit...@apache.org> > >>>> wrote: > >>>> > >>>>> Hi all, > >>>>> > >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] is > >>>> about > >>>>> introducing a new string algorithm called Jaro Winkler Distance [2]. > >>>> Since > >>>>> StringUtils already does a lot of things, I'm wondering if it may > >> make > >>>>> sense to introduce a new class that serves as a host for more string > >>>>> algorithms to come. It would look something like: > >>>>> > >>>>> StringAlgorithms.levenshteinDistance(str1, str2); > >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2); > >>>>> > >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate to > >>> the > >>>>> new class. It could be removed from StringUtils in the next major > >>>> release. > >>>>> > >>>> > >>>>> Thoughts? > >>>>> > >>>> > >>>> Yuck! > >>>> > >>>> I'd rather have once class per algo which reminds me that [codec] > might > >>> be > >>>> a better place for things like this that 'encode' strings into > >> something > >>>> else. > >>>> > >>> > >>> Both methods return a double value modeling some kind of score. They do > >> not > >>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore > >> or > >>> something like that? > >>> > >> > >> Still wrong IMO and not OO. A single class will become another > >> dumping-ground/kitchen-sink like StringUtils. I would not want to see > one > >> algo be a one method one liner impl and another algo be a complex 20 > method > >> job. I guess we could organize algos using nested classes like > >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is > >> another way to go. > >> > > > > We already have o.a.c.lang3.text, maybe this would fit? > > > > What I want to avoid is something like: > > > > LevenshteinDistance algo = new LevenshteinDistance() > > double dist = algo.getDistance(str1, str2); > > > > If those algorithms don't have a state, it doesn't make sense to force > > creation of an object. I like to idea of internal classes. > > IIUC, both algorithms do the same thing - calculating the difference (or > similarity) of two strings - using different methods. > > So another option would be to extract a common interface > (StringDifferenceMetric?) and provide the algorithms as concrete > implementations. >
This is a possible, but very specific (= tied to distance measuring) approach. I think it is a good idea to create very specific utilities instead of generic ones like StringUtils, that can do a variety of things. > > A concrete use case could be a query engine which allows customizing its > string matching algorithm. > Is this really a use case? It sounds very constructed to me. Have you ever thought "I'd like to query on google, but I'd like suggestions to be matched using Levenshtein Distance algorithm"? > > If you want to avoid instantiating algorithm classes with no state, we > could have an enum with constants representing the available algorithms. > I still favor specific methods over an additional parameter. > > Oliver > > > > > > >> > >> Gary > >> > >> > >>> > >>> > >>>> > >>>> Gary > >>>> > >>>> > >>>>> Benedikt > >>>>> > >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944 > >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance > >>>>> > >>>>> -- > >>>>> http://people.apache.org/~britter/ > >>>>> http://www.systemoutprintln.de/ > >>>>> http://twitter.com/BenediktRitter > >>>>> http://github.com/britter > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > >>>> Java Persistence with Hibernate, Second Edition< > >>>> http://www.manning.com/bauer3/> > >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > >>>> Spring Batch in Action <http://www.manning.com/templier/> > >>>> Blog: http://garygregory.wordpress.com > >>>> Home: http://garygregory.com/ > >>>> Tweet! http://twitter.com/GaryGregory > >>>> > >>> > >>> > >>> > >>> -- > >>> http://people.apache.org/~britter/ > >>> http://www.systemoutprintln.de/ > >>> http://twitter.com/BenediktRitter > >>> http://github.com/britter > >>> > >> > >> > >> > >> -- > >> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > >> Java Persistence with Hibernate, Second Edition< > >> http://www.manning.com/bauer3/> > >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > >> Spring Batch in Action <http://www.manning.com/templier/> > >> Blog: http://garygregory.wordpress.com > >> Home: http://garygregory.com/ > >> Tweet! http://twitter.com/GaryGregory > >> > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter