Rupert Westenthaler created STANBOL-1178:
--------------------------------------------

             Summary: Remove 'Link Upper Case Tokens without POS tags' options 
fromt the EntityLinger 
                 Key: STANBOL-1178
                 URL: https://issues.apache.org/jira/browse/STANBOL-1178
             Project: Stanbol
          Issue Type: Bug
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


As stated in a comment of STANBOL-1049:

> As noted by Joseph M'Bimbi-Bene in
> http://markmail.org/message/erubqmhwytp7mxoa
> 
> The property
> 
> enhancer.engines.linking.linkOnlyUpperCaseTokensWithMissingPosTag
> 
> interferes with the upper case parameter ('uc={NONE/MATCH/LINK}')
> supported by the Text Processing configuration.
> 
> To avoid this it needs to be investigated if the functionality described by 
> this
> issue can also be implemented by using the
> 'enhancer.engines.linking.minSearchTokenLength' property in combination > 
> with the value of the 'uc' parameter of the text processing configuration.

Because of this the 'linkOnlyUpperCaseTokensWithMissingPosTag' option should be 
removed and the existing ''uc' parameter should be changed to work similar to 
'linkOnlyUpperCaseTokensWithMissingPosTag'.

This will change the 'uc' parameter to a boolean switch. If enabled it will 
change upper case tokens from

* NONE -> MATCH
* MATCH -> LINK

The default configuration will be enabled for all languages other than Germans 
(as in German all Nouns are written using upper case).

As this change will affect existing configurations it will only take place 
after the upcoming 0.12.0 release of Apache Stanbol.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to