deyinchen created LUCENE-6111:
---------------------------------
Summary: Add Chinese Word Segmentation Analyzer with Ansj
implementation
Key: LUCENE-6111
URL: https://issues.apache.org/jira/browse/LUCENE-6111
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 4.6
Reporter: deyinchen
Priority: Minor
Fix For: 4.6
When I use mahout-0.9 depending on lucene-4.6 to run Kmeans clustering
algorithm, I find that the default word segmentation analyzer class named
'org.apache.lucene.analysis.standard.StandardAnalyzer' is very ugly, only
single word could be splitted.However, ansj Chinese word segmentation tool is
widely used in Chinese document-tokenizer, and I am willing to add it to
support lucene.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]