Hi guys, As the topic,it seems that the length of filed does not affect the doc score accurately for chinese analyzer in my source code
index source code private static Directory DIRECTORY; @BeforeClass public static void before() throws IOException { DIRECTORY = new RAMDirectory(); Analyzer chineseanalyzer = new SmartChineseAnalyzer(Version.LUCENE_40); IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_40,chineseanalyzer); FieldType nameType = new FieldType(); nameType.setIndexed(true); nameType.setStored(true); nameType.setOmitNorms(false); try { IndexWriter indexWriter = new IndexWriter(DIRECTORY, indexWriterConfig); List<String> nameList = new ArrayList<String>(); nameList.add("咨询公司");nameList.add("飞鹰咨询管理咨询公司");nameList.add("北京中标咨询公司");nameList.add("重庆咨询公司");nameList.add("商务咨询服务公司");nameList.add("法律咨询公司"); for (int i = 0; i < nameList.size(); i++) { Document document = new Document(); document.add(new Field("name", nameList.get(i), nameType)); document.add(new Field("id",String.valueOf(i+1),nameType)); indexWriter.addDocument(document); } indexWriter.commit(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } search snippet: @Test public void testChinese() throws IOException, ParseException { String keyword = "咨询公司"; System.out.println("Searching for:" + keyword); System.out.println(); IndexReader indexReader = DirectoryReader.open(DIRECTORY); IndexSearcher indexSearcher = new IndexSearcher(indexReader); Query query = null; query = new QueryParser(Version.LUCENE_40,"name",new SmartChineseAnalyzer(Version.LUCENE_40)).parse(keyword); TopDocs topDocs = indexSearcher.search(query,15); System.out.println("Search Result:"); if (null !=topDocs && 0 < topDocs.totalHits) { for (ScoreDoc scoreDoc : topDocs.scoreDocs) { System.out.println("doc id:" + indexSearcher.doc(scoreDoc.doc).get("id")); String name = indexSearcher.doc(scoreDoc.doc).get("name"); System.out.println("content of Field:" + name); dumpCNTokens(name); System.out.println("score:" + scoreDoc.score); System.out.println("-------------------------------------------"); } } else { System.out.println("no results"); } } And search result as follows: Searching for:咨询公司 Search Result: doc id:1 content of Field:咨询公司 Terms:咨询 公司 score:0.74763227 ------------------------------------------- doc id:2 content of Field:飞鹰咨询管理咨询公司 Terms:飞鹰 咨询 管理 咨询 公司 score:0.6317303 ------------------------------------------- doc id:3 content of Field:北京中标咨询公司 Terms:北京 中标 咨询 公司 score:0.5981058 ------------------------------------------- doc id:4 content of Field:重庆咨询公司 Terms:重庆 咨询 公司 score:0.5981058 ------------------------------------------- doc id:5 content of Field:商务咨询服务公司 Terms:商务 咨询 服务 公司 score:0.5981058 ------------------------------------------- doc id:6 content of Field:法律咨询公司 Terms:法律 咨询 公司 score:0.5981058 ------------------------------------------- docs:3,4,5,6 have the same score, but I think the doc 4 and doc 6 should have a higner score than the doc 3,5, becase the doc 4 and doc 6 have three terms ,doc 3,5 have four terms. Am I right? who can give me a explanation? And how to get the expected result? -- View this message in context: http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org