[ 
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-626:
-------------------------------

    Description: 
Extensive java docs available in patch, but I try to keep it compiled here: 
http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description

Example:
{code:java}
public void testImportData() throws Exception {

    // load 200 000 user queries with session data and time stamp. no goals 
specified.

    System.out.println("Processing 
http://ginandtonique.org/~kalle/data/pirate.data.gz";);
    importFile(new InputStreamReader(new GZIPInputStream(new 
URL("http://ginandtonique.org/~kalle/data/pirate.data.gz";).openStream())));
    System.out.println("Processing 
http://ginandtonique.org/~kalle/data/hero.data.gz";);
    importFile(new InputStreamReader(new GZIPInputStream(new 
URL("http://ginandtonique.org/~kalle/data/hero.data.gz";).openStream())));
    System.out.println("Done.");

    // run some tests without the second level suggestions,
    // i.e. user behavioral data only. no ngrams or so.
    
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe 
caribbean"));

    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carribbean"));

    assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carriben"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carabien"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carabbean"));

    assertEquals("pirates of the caribbean", facade.didYouMean("pirates og 
carribean"));

    assertEquals("pirates of the caribbean soundtrack", 
facade.didYouMean("pirates of the caribbean music"));
    assertEquals("pirates of the caribbean score", facade.didYouMean("pirates 
of the caribbean soundtrack"));

    assertEquals("pirate of caribbean", facade.didYouMean("pirate of 
carabian"));
    assertEquals("pirates of caribbean", facade.didYouMean("pirate of 
caribbean"));
    assertEquals("pirates of caribbean", facade.didYouMean("pirates of 
caribbean"));

    // depening on how many hits and goals are noted with these two queries
    // perhaps the delta should be added to a synonym dictionary? 
    assertEquals("homm iv", facade.didYouMean("homm 4"));

    // not yet known.. and we have no second level yet.
    assertNull(facade.didYouMean("the pilates"));

    // use the dictionary built from user queries to build the token phrase and 
ngram suggester.      
    
facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
 1d);

    // now it's learned
    assertEquals("the pirates", facade.didYouMean("the pilates"));

    // typos
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
fight and magic"));
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
right and magic"));
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
magic and light"));

    // composite dictionary key not learned yet..
    assertEquals(null, facade.didYouMean("heroesof lightand magik"));
    // learn
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
light and magik"));
    // test
    assertEquals("heroes of might and magic", facade.didYouMean("heroesof 
lightand magik"));


    // wrong term order
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
magic and might"));

  }
{code}


  was:
Extensive java docs available in patch, but I try to keep it compiled here: 
http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description

The patch spellcheck.diff should not depend on anything but Lucene trunk. It 
has basic support for phrase suggestions  and query goal detection, but is 
pretty buggy and lacks features available in didyoumean.diff.bz2. The latter 
depends on LUCENE-550.

Example:
{code:java}
public void testImportData() throws Exception {

    // load 200 000 user queries with session data and time stamp. no goals 
specified.

    System.out.println("Processing 
http://ginandtonique.org/~kalle/data/pirate.data.gz";);
    importFile(new InputStreamReader(new GZIPInputStream(new 
URL("http://ginandtonique.org/~kalle/data/pirate.data.gz";).openStream())));
    System.out.println("Processing 
http://ginandtonique.org/~kalle/data/hero.data.gz";);
    importFile(new InputStreamReader(new GZIPInputStream(new 
URL("http://ginandtonique.org/~kalle/data/hero.data.gz";).openStream())));
    System.out.println("Done.");

    // run some tests without the second level suggestions,
    // i.e. user behavioral data only. no ngrams or so.
    
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe 
caribbean"));

    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carribbean"));

    assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carriben"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carabien"));
    assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the 
carabbean"));

    assertEquals("pirates of the caribbean", facade.didYouMean("pirates og 
carribean"));

    assertEquals("pirates of the caribbean soundtrack", 
facade.didYouMean("pirates of the caribbean music"));
    assertEquals("pirates of the caribbean score", facade.didYouMean("pirates 
of the caribbean soundtrack"));

    assertEquals("pirate of caribbean", facade.didYouMean("pirate of 
carabian"));
    assertEquals("pirates of caribbean", facade.didYouMean("pirate of 
caribbean"));
    assertEquals("pirates of caribbean", facade.didYouMean("pirates of 
caribbean"));

    // depening on how many hits and goals are noted with these two queries
    // perhaps the delta should be added to a synonym dictionary? 
    assertEquals("homm iv", facade.didYouMean("homm 4"));

    // not yet known.. and we have no second level yet.
    assertNull(facade.didYouMean("the pilates"));

    // use the dictionary built from user queries to build the token phrase and 
ngram suggester.      
    
facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
 1d);

    // now it's learned
    assertEquals("the pirates", facade.didYouMean("the pilates"));

    // typos
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
fight and magic"));
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
right and magic"));
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
magic and light"));

    // composite dictionary key not learned yet..
    assertEquals(null, facade.didYouMean("heroesof lightand magik"));
    // learn
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
light and magik"));
    // test
    assertEquals("heroes of might and magic", facade.didYouMean("heroesof 
lightand magik"));


    // wrong term order
    assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
magic and might"));

  }
{code}



> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-626
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>            Priority: Minor
>         Attachments: LUCENE-626_2007_10_16.txt
>
>
> Extensive java docs available in patch, but I try to keep it compiled here: 
> http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
> Example:
> {code:java}
> public void testImportData() throws Exception {
>     // load 200 000 user queries with session data and time stamp. no goals 
> specified.
>     System.out.println("Processing 
> http://ginandtonique.org/~kalle/data/pirate.data.gz";);
>     importFile(new InputStreamReader(new GZIPInputStream(new 
> URL("http://ginandtonique.org/~kalle/data/pirate.data.gz";).openStream())));
>     System.out.println("Processing 
> http://ginandtonique.org/~kalle/data/hero.data.gz";);
>     importFile(new InputStreamReader(new GZIPInputStream(new 
> URL("http://ginandtonique.org/~kalle/data/hero.data.gz";).openStream())));
>     System.out.println("Done.");
>     // run some tests without the second level suggestions,
>     // i.e. user behavioral data only. no ngrams or so.
>     
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe 
> caribbean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carribbean"));
>     assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carriben"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carabien"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carabbean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates og 
> carribean"));
>     assertEquals("pirates of the caribbean soundtrack", 
> facade.didYouMean("pirates of the caribbean music"));
>     assertEquals("pirates of the caribbean score", facade.didYouMean("pirates 
> of the caribbean soundtrack"));
>     assertEquals("pirate of caribbean", facade.didYouMean("pirate of 
> carabian"));
>     assertEquals("pirates of caribbean", facade.didYouMean("pirate of 
> caribbean"));
>     assertEquals("pirates of caribbean", facade.didYouMean("pirates of 
> caribbean"));
>     // depening on how many hits and goals are noted with these two queries
>     // perhaps the delta should be added to a synonym dictionary? 
>     assertEquals("homm iv", facade.didYouMean("homm 4"));
>     // not yet known.. and we have no second level yet.
>     assertNull(facade.didYouMean("the pilates"));
>     // use the dictionary built from user queries to build the token phrase 
> and ngram suggester.      
>     
> facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
>  1d);
>     // now it's learned
>     assertEquals("the pirates", facade.didYouMean("the pilates"));
>     // typos
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> fight and magic"));
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> right and magic"));
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> magic and light"));
>     // composite dictionary key not learned yet..
>     assertEquals(null, facade.didYouMean("heroesof lightand magik"));
>     // learn
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> light and magik"));
>     // test
>     assertEquals("heroes of might and magic", facade.didYouMean("heroesof 
> lightand magik"));
>     // wrong term order
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> magic and might"));
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to