[ 
https://issues.apache.org/jira/browse/LUCENE-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816891#comment-16816891
 ] 

Tomoko Uchida commented on LUCENE-8752:
---------------------------------------

Hi [~thetaphi],

current directory structure is slightly different from what I thought. I put 
the patch file into {{src/tools/patches}} directory. What do you think of this?
{code:java}
lucene$ tree -L 4 analysis/kuromoji/
analysis/kuromoji/
└── src
    ├── java
    │   ├── org
    │   │   └── apache
    │   └── overview.html
    ├── resources
    │   ├── META-INF
    │   │   └── services
    │   └── org
    │       └── apache
    ├── test
    │   └── org
    │       └── apache
    └── tools
        ├── java
        │   └── org
        ├── patches
        │   └── Noun.proper.csv.patch
        └── test
            └── org
{code}

> Apply a patch to kuromoji dictionary to properly handle Japanese new era '令和' 
> (REIWA)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8752
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8752
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Assignee: Tomoko Uchida
>            Priority: Minor
>
> As of May 1st, 2019, Japanese era '元号' (Gengo) will be set to '令和' (Reiwa). 
> See this article for more details:
> [https://www.bbc.com/news/world-asia-47769566]
> Currently '令和' is splitted up to '令' and '和' by {{JapaneseTokenizer}}. It 
> should be tokenized as one word so that Japanese texts including era names 
> are searched as users expect. Because the default Kuromoji dictionary 
> (mecab-ipadic) has not been maintained since 2007, a one-line patch to the 
> source CSV file is needed for this era change.
> Era name is used in many official or formal documents in Japan, so it would 
> be desirable the search systems properly handle this without adding a user 
> dictionary or using phrase query. :)
> FYI, JDK DateTime API will support the new era (in the next updates.)
> [https://blogs.oracle.com/java-platform-group/a-new-japanese-era-for-java]
> The patch is available here:
> [https://github.com/apache/lucene-solr/pull/632]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to