Hi guys,

As the topic,it seems that the length of filed does not affect the doc score
accurately for chinese analyzer in my source code

index source code

 private static Directory DIRECTORY;


    @BeforeClass
    public static void before() throws IOException {
          DIRECTORY = new RAMDirectory();
          Analyzer chineseanalyzer = new
SmartChineseAnalyzer(Version.LUCENE_40);
          IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(Version.LUCENE_40,chineseanalyzer);
          FieldType nameType = new FieldType();
          nameType.setIndexed(true);
          nameType.setStored(true);
          nameType.setOmitNorms(false);
          try {
              IndexWriter indexWriter = new IndexWriter(DIRECTORY,
indexWriterConfig);

              List<String> nameList = new ArrayList<String>();
             
nameList.add("咨询公司");nameList.add("飞鹰咨询管理咨询公司");nameList.add("北京中标咨询公司");nameList.add("重庆咨询公司");nameList.add("商务咨询服务公司");nameList.add("法律咨询公司");
              for (int i = 0; i < nameList.size(); i++) {
                  Document document = new Document();
                  document.add(new Field("name", nameList.get(i),
nameType));
                  document.add(new
Field("id",String.valueOf(i+1),nameType));
                  indexWriter.addDocument(document);
            }
              indexWriter.commit();
          } catch (IOException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
          }
    }

search snippet:
 @Test
    public void testChinese() throws IOException, ParseException {
        String keyword = "咨询公司";
        System.out.println("Searching for:" + keyword);
        System.out.println();
        IndexReader indexReader = DirectoryReader.open(DIRECTORY);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        Query query = null;
        query = new QueryParser(Version.LUCENE_40,"name",new
SmartChineseAnalyzer(Version.LUCENE_40)).parse(keyword);
        TopDocs topDocs = indexSearcher.search(query,15);
        System.out.println("Search Result:");
        if (null !=topDocs && 0 < topDocs.totalHits) {
            for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
                System.out.println("doc id:" +
indexSearcher.doc(scoreDoc.doc).get("id"));
                String name = indexSearcher.doc(scoreDoc.doc).get("name");
                System.out.println("content of Field:" + name);
                dumpCNTokens(name);
                System.out.println("score:" + scoreDoc.score);
               
System.out.println("-------------------------------------------");
            }
        } else {
            System.out.println("no results");
        }

    }


And search result as follows:
Searching for:咨询公司

Search Result:
doc id:1
content of Field:咨询公司
Terms:咨询        公司      
score:0.74763227
-------------------------------------------
doc id:2
content of Field:飞鹰咨询管理咨询公司
Terms:飞鹰        咨询      管理      咨询      公司      
score:0.6317303
-------------------------------------------
doc id:3
content of Field:北京中标咨询公司
Terms:北京        中标      咨询      公司      
score:0.5981058
-------------------------------------------
doc id:4
content of Field:重庆咨询公司
Terms:重庆        咨询      公司      
score:0.5981058
-------------------------------------------
doc id:5
content of Field:商务咨询服务公司
Terms:商务        咨询      服务      公司      
score:0.5981058
-------------------------------------------
doc id:6
content of Field:法律咨询公司
Terms:法律        咨询      公司      
score:0.5981058
-------------------------------------------

docs:3,4,5,6 have the same score, but I think the doc 4 and doc 6 should
have a higner score than the doc 3,5, becase the doc 4 and doc 6 have three
terms ,doc 3,5 have four terms. 
Am I right? who can give me a explanation? And how to get the expected
result?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to