Hi Kenney, I must admit that we currently don't have documentation for how to enable Chinese full text indexing in DSpace.
However, if you are storing primarily Chinese full text documents in your DSpace, I don't think it would be too difficult to change the current Solr indexing settings to support that. Solr has some documentation on how best to index Chinese here: https://solr.apache.org/guide/8_0/language-analysis.html#traditional-chinese What I think you'd want to do in DSpace is to add a new fieldType called "text_mandarin" (or similar) to the 'search' schema: https://github.com/DSpace/DSpace/blob/main/dspace/solr/search/conf/schema.xml<https://github.com/DSpace/DSpace/blob/main/dspace/solr/search/conf/schema.xml#L68-L104> This fieldType might look something like this: <fieldType name="text_mandarin" class="solr.TextField"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> Then, if you want the "fulltext" field (which stores the fulltext of documents) to always do indexing/parsing of Chinese, you'd change its type to be "text_mandarin" (instead of just "text") here: https://github.com/DSpace/DSpace/blob/main/dspace/solr/search/conf/schema.xml#L237 Then you'd have to reindex everything in Solr (./dspace index-discovery -b). I think this would work, but I'll admit I've never tried it. So, it's always possible I'm overlooking a step to get this working. Keep in mind, this would only change the behavior of full text indexing/searching... and it would change that behavior globally (so all documents in DSpace would be assumed to contain Chinese text). Unfortunately, at this time, DSpace doesn't have any smart way to detect the language of documents and index each language differently. If this sounds like what you need & you find it works for you, please let us know. That way we can more formally document similar instructions for others who may need them. Tim ________________________________ From: [email protected] <[email protected]> on behalf of Kenney Guo <[email protected]> Sent: Tuesday, August 30, 2022 8:14 PM To: DSpace Technical Support <[email protected]> Subject: [dspace-tech] documentation for Chinese full text indexing Dear DSpace team, With a default installation of the DSpace 7.2, I am not able to search my Chinese documents well. After some research, I realize that I can configure the (word) Analyzer in solr. However, I did not found any official documentation on how to do that. Could anyone point me to those documentations? Thanks very much, Kenney -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/edf5966d-8476-4a71-82d1-8b22e7b31b28n%40googlegroups.com<https://groups.google.com/d/msgid/dspace-tech/edf5966d-8476-4a71-82d1-8b22e7b31b28n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/PH0PR22MB32742C39C1E5259934586FACED7B9%40PH0PR22MB3274.namprd22.prod.outlook.com.
