[jira] [Closed] (LUCENENET-559) Search word request on Chinese is not working properly

Alex Simatov (JIRA) Thu, 14 Jul 2016 06:04:32 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENENET-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Simatov closed LUCENENET-559.
----------------------------------
    Resolution: Invalid

Created by mistake in different project.
Originally should be https://issues.apache.org/jira/browse/LUCENE-7379

> Search word request on Chinese is not working properly
> ------------------------------------------------------
>
>                 Key: LUCENENET-559
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-559
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 5.0 PCL
>            Reporter: Alex Simatov
>
> Originally we used Lucene 2.3 in the project for years.
> Some time ago we made an update to the 5.0.0 version of Lucene.
> After that Chinese analyzing stopped working normally (I did not test it on 
> Japanese or Korean)
> We have the following code to process the search request:
> 1. analyzer = new ClassicAnalyzer();
> 2. logger.Write2Log(queryString);
> 3. QueryParser qp = new QueryParser(fieldName, analyzer);
> 4. Query query = qp.parse(queryString);
> 5. logger.Write2Log(query.toString(fieldName));
> 6. int hits = searcher.search(query, 1).totalHits;
> Analyzer on line 1 could be changed by config.
> Line 2 is printing what we put to the Lucene.
> Line 5 is printing how the query modified in Lucene
> Normally we are using the string 打不开~0.7 for 70% or more accuracy and  打不开 to 
> find exact this word.
> ~0.7 functionality was marked as deprecated since 4.0 version, however it is 
> still worked on English at least.
> What was before (on Lucene 2.3):
> Line 2: 打不开~0.7 
> Line 5: 打不开~0.7
> If we provide the correct string for analysis, line 6 returns correct result
> The same for case of 打不开 without accuracy (without ~0.7)
> What is now (on Lucene 5.0):
> Line 2: 打不开~0.7 
> Line 5: 打不开~0
> As I understood it is modifying of deprecated parameter to newly supported 
> one with a little different meaning (at least it is working like I said on 
> English).
> The string for analysis contains the 打不开, however line 6 shows nothing is 
> found.
> Line 2: 打不开 
> Line 5: 打 不 开
> Lucene added spaces, which are interpreted as OR operator. As result Line 6 
> returns that keyword found even if it is only one 不 symbol in the string for 
> analysis.
> The same scenario was tested on the CJKAnalyzer, ClassicAnalyzer  and 
> SmartChineseAnalyzer. Results are the same: neither one of them has the same 
> functionality as analyzer on Lucene 2.3
> Is it known problem in the product? Could you please explain or provide any 
> docs about how the search should work for Chinese in mentioned cases.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (LUCENENET-559) Search word request on Chinese is not working properly

Reply via email to