[ 
https://issues.apache.org/jira/browse/CODEC-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Pompl updated CODEC-107:
-----------------------------

    Description: 
The current userguide (http://commons.apache.org/codec/userguide.html) just 
lists four Language Encoders, but there are five at the moment. CODEC-106 
implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple 
performance measurement:

_SNIP_

org.apache.commons.codec.language.Metaphone encodings per msec: 327
org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
org.apache.commons.codec.language.Soundex encodings per msec: 904
org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
org.apache.commons.codec.language.Caverphone encodings per msec: 5
org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289

So, Soundex is the fastest encoder. Caverphone is much slower than any other 
algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  private static final int REPEATS = 1000000;

  public void checkSpeed() throws Exception {
          checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
          checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
          checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
          checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
          checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
          checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
  }
  
  private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int 
repeats) throws Exception {
          long start = System.currentTimeMillis();
          for ( int i=0; i<repeats; i++) {
                    encoder.encode(toBeEncoded);
          }
          long duration = System.currentTimeMillis()-start;
          System.out.println(encoder.getClass().getName() + " encodings per 
msec: "+(repeats/duration));
  }
{code}

_SNAP_

  was:
The current userguide (http://commons.apache.org/codec/userguide.html) just 
lists four Language Encoders, but there are five at the moment. CODEC-106 
implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the userguide in order to show a simple 
performance measurement:

_SNAP_

org.apache.commons.codec.language.Metaphone encodings per msec: 327
org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
org.apache.commons.codec.language.Soundex encodings per msec: 904
org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
org.apache.commons.codec.language.Caverphone encodings per msec: 5
org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289

So, Soundex is the fastest encoder. Caverphone is much slower than any other 
algorithm. All others show off nearly the same performance.

Checked with the following code:

{code:java}
  public void checkSpeed() throws Exception {
          checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
          checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
          checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
          checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
          checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
          checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
  }
  
  private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int 
repeats) throws Exception {
          long start = System.currentTimeMillis();
          for ( int i=0; i<repeats; i++) {
                    encoder.encode(toBeEncoded);
          }
          long duration = System.currentTimeMillis()-start;
          System.out.println(encoder.getClass().getName() + " encodings per 
msec: "+(repeats/duration));
  }
{code}

_SNAP_


> Enhance documentation for Language Encoders
> -------------------------------------------
>
>                 Key: CODEC-107
>                 URL: https://issues.apache.org/jira/browse/CODEC-107
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Marc Pompl
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current userguide (http://commons.apache.org/codec/userguide.html) just 
> lists four Language Encoders, but there are five at the moment. CODEC-106 
> implements a sixth one.
> Would be a good idea, to complete documentation.
> Additionally, I suggest to extent the userguide in order to show a simple 
> performance measurement:
> _SNIP_
> org.apache.commons.codec.language.Metaphone encodings per msec: 327
> org.apache.commons.codec.language.DoubleMetaphone encodings per msec: 224
> org.apache.commons.codec.language.Soundex encodings per msec: 904
> org.apache.commons.codec.language.RefinedSoundex encodings per msec: 637
> org.apache.commons.codec.language.Caverphone encodings per msec: 5
> org.apache.commons.codec.language.ColognePhonetic encodings per msec: 289
> So, Soundex is the fastest encoder. Caverphone is much slower than any other 
> algorithm. All others show off nearly the same performance.
> Checked with the following code:
> {code:java}
>   private static final int REPEATS = 1000000;
>   public void checkSpeed() throws Exception {
>         checkSpeedEncoding(new Metaphone(), "easgasg", REPEATS);
>         checkSpeedEncoding(new DoubleMetaphone(), "easgasg", REPEATS);
>         checkSpeedEncoding(new Soundex(), "easgasg", REPEATS);
>         checkSpeedEncoding(new RefinedSoundex(), "easgasg", REPEATS);
>         checkSpeedEncoding(new Caverphone(), "Carlene", 100000);
>         checkSpeedEncoding(new ColognePhonetic(), "Schmitt", REPEATS);
>   }
>   
>   private void checkSpeedEncoding(Encoder encoder, String toBeEncoded, int 
> repeats) throws Exception {
>         long start = System.currentTimeMillis();
>         for ( int i=0; i<repeats; i++) {
>                   encoder.encode(toBeEncoded);
>         }
>         long duration = System.currentTimeMillis()-start;
>         System.out.println(encoder.getClass().getName() + " encodings per 
> msec: "+(repeats/duration));
>   }
> {code}
> _SNAP_

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to