twosom opened a new issue, #14940: URL: https://github.com/apache/lucene/issues/14940
### Description ## Overview This issue proposes adding metadata support to the Nori Korean analyzer, allowing users to attach additional information to dictionary words that can be accessed during text analysis. ## Background and Motivation Currently, the Nori analyzer allows users to register words in a custom dictionary, but there's no way to associate additional information with these words. By supporting metadata, we can enable: - Attaching semantic category information to words (e.g., "Java" -> "programming language") - Preserving information for compound words - Custom tagging and classification - Domain-specific annotations ## Proposed Implementation 1. Add metadata support to Token class 2. Create `MetadataAttribute` and its implementation 3. Extend user dictionary format with a metadata separator (`>>`) 4. Preserve metadata during compound word decomposition ## Usage Example User dictionary: ``` 자바 >> computer language java >> computer language 엘라스틱서치 엘라스틱 서치 >> search engine ``` and this should be input : 자바 ``` /* Output: Term: 자바 Metadata: computer language POS: NNG --- ``` input : 엘라스틱서치 ``` /* Output: Term: 엘라스틱서치 Metadata: search engine Position Increment: 1 Position Length: 2 --- Term: 엘라스틱 Metadata: search engine Position Increment: 0 Position Length: 1 --- Term: 서치 Metadata: search engine Position Increment: 1 Position Length: 1 --- ``` ## Benefits 1. Enhanced information modeling: Attach additional information to words to improve search quality 2. Domain-specific analysis: Define metadata relevant to specific domains 3. Custom dictionary extension: Add capabilities while maintaining backward compatibility -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org