[jira] [Commented] (ROL-2090) Lucene integration doesn't work well for entries that written in some languages

ASF GitHub Bot (JIRA) Fri, 23 Mar 2018 23:04:10 -0700

    [ 
https://issues.apache.org/jira/browse/ROL-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412442#comment-16412442
 ]


ASF GitHub Bot commented on ROL-2090:
-------------------------------------

GitHub user lbtc-xxx opened a pull request:

    https://github.com/apache/roller/pull/10

    Make some Lucene configuration adjustable

    https://issues.apache.org/jira/browse/ROL-2090

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lbtc-xxx/roller ROL-2090

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/roller/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10
    
----
commit c17f7b33d1927e4bfbf476456e0100e432076484
Author: Kohei Nozaki <kohei@...>
Date:   2018-03-24T06:01:10Z

    Make some Lucene configuration adjustable

----


> Lucene integration doesn't work well for entries that written in some 
> languages
> -------------------------------------------------------------------------------
>
>                 Key: ROL-2090
>                 URL: https://issues.apache.org/jira/browse/ROL-2090
>             Project: Apache Roller
>          Issue Type: Improvement
>          Components: Data Model &amp; JPA Backend
>    Affects Versions: 5.1.2
>            Reporter: Kohei Nozaki
>            Assignee: Roller Unassigned
>            Priority: Minor
>         Attachments: ROL-2090.patch
>
>
> Reported in 
> http://benzaiten.dyndns.org/roller/ugya/entry/roller_500_to_510_migration 
> (Japanese). Summary in English:
> h4. Japanese keywords doesn't hit against the latter part of long entry
> It's caused by maximum token limit in the following code. The author said 
> that typical Japanese text is not splitted by white spaces so that's not work 
> well with it.
> {noformat}
> // Limit to 1000 tokens.
> LimitTokenCountAnalyzer analyzer = new LimitTokenCountAnalyzer(
>         IndexManagerImpl.getAnalyzer(), 1000);
> {noformat}
> h4. StandardAnalyzer doesn't work well with Japanese text
> Roller uses {{StandardAnalyzer}} but there are some other language specific 
> implementations for it such as {{CJKAnalyzer}} or {{JapaneseAnalyzer}}. The 
> author said that these implementations improve accuracy for such languages. I 
> know these implementations are language specific so we can't simply replace 
> it to them but might want to switch it in flexible manner, Such as using 
> language configuration in each blogs.
> {noformat}
> public static final Analyzer getAnalyzer() {
>     return new StandardAnalyzer(FieldConstants.LUCENE_VERSION);
> }
> {noformat}
> I'm still not sure what would be proper solutions but I believe we have room 
> for some improvement here. Any advices would be appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ROL-2090) Lucene integration doesn't work well for entries that written in some languages

Reply via email to