[jira] [Commented] (LUCENE-9030) Solr- and WordnetSynonymParser behaviour differs

Alan Woodward (Jira) Wed, 13 Nov 2019 06:41:07 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973396#comment-16973396
 ]


Alan Woodward commented on LUCENE-9030:
---------------------------------------

Thanks for opening this fix, [~cbuescher] - I'm just running precommit now and 
will merge it in once that check passes.

> Solr- and WordnetSynonymParser behaviour differs
> ------------------------------------------------
>
>                 Key: LUCENE-9030
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9030
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 8.2
>            Reporter: Christoph Büscher
>            Assignee: Alan Woodward
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Equivalent synonyms are showing up with different token types and ordering 
> depending on whether the Solr format or the Wordnet format is used. A synonym 
> set like
> "woods, wood, forest" in Solr format leads to the following token stream 
> (term and type) when analyzing the term "forest":  
> "forest"/word, "woods"/SYNONYM, "wood" /SYNONYM
>  
> The following set in Wordnet format should give the same output (all terms 
> are in the same synset), however all tokens are of type SYNONYM here and the 
> original input token "forest" isn't the first one:
> synonyms.txt:
> {code:java}
> s(100000001,1,'woods',n,1,0)
> s(100000001,2,'wood',n,1,0)
> s(100000001,3,'forest',n,1,0){code}
> Token stream (term/type) when an
> woods"/SYNONYM, "wood" /SYNONYM, "forest"/SYNONYM
> I don't think this is intentional and is confusing (especially because the 
> "original" input token type gets lost). I saw that the way the synsets are 
> added to the SynonymMap in the respective parsers differes and have a PR that 
> changes this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9030) Solr- and WordnetSynonymParser behaviour differs

Reply via email to