[jira] [Resolved] (SOLR-7509) Solr Multilingual Indexing with one field

Shawn Heisey (JIRA) Thu, 07 May 2015 08:07:43 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shawn Heisey resolved SOLR-7509.
--------------------------------
    Resolution: Invalid

Please use the solr-user mailing list or the IRC channel for support requests.  
Depending on the time of day, the IRC channel can be very responsive, but the 
mailing list reaches a LOT more people.

http://lucene.apache.org/solr/resources.html#irc

This issue tracker is primarily for bugs and feature requests.

> Solr Multilingual Indexing with one field
> -----------------------------------------
>
>                 Key: SOLR-7509
>                 URL: https://issues.apache.org/jira/browse/SOLR-7509
>             Project: Solr
>          Issue Type: Wish
>          Components: Schema and Analysis
>    Affects Versions: 4.2.1
>         Environment: Redhat Linux, 4 core, 12 GB
>            Reporter: Kuntal Ganguly
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Our current production index size is 1.5 TB with 3 shards. Currently we have 
> the following field type:
> <fieldType name="text_ngram" class="solr.TextField" 
> positionIncrementGap="100">
>     <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.CustomNGramFilterFactory" minGramSize="3" 
> maxGramSize="30" preserveOriginal="true"/>
>     </analyzer>
>     </fieldType>
> And the above field type is working well for the US and English language 
> clients.
> Now we have some new Chinese and Japanese client ,so after google
> http://www.basistech.com/indexing-strategies-for-multilingual-search-with-solr-and-rosette/
> https://docs.lucidworks.com/display/lweug/Multilingual+Indexing+and+Search
>  for best approach for multilingual index,there seems to be pros/cons 
> associated with every approach.
> Then i tried RnD with a single field approach and here's my new field type:
> <fieldType name="text_multi" class="solr.TextField" 
> positionIncrementGap="100">
>     <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.CJKWidthFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.CJKBigramFilterFactory"/>
>     </analyzer>
>     <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.CJKWidthFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.CJKBigramFilterFactory"/>
>         <filter class="solr.CustomNGramFilterFactory" minGramSize="3" 
> maxGramSize="30" preserveOriginal="true"/>
>     </analyzer>
>     </fieldType>
> I have kept the same tokenizer, only changed the filters.And it is working 
> well with all existing search /use-case for English documents as well as new 
> use case for Chinese/Japanese documents.
> Now i have the following questions to the Solr experts/developer:
> 1) Is this a correct approach to do it? Or i'm missing something?
> 2) Can you give me an example where there will be problem with this above new 
> field type? A use-case/scenario with example will be very helpful.
> 3) Also is there any problem in future with different clients coming up?
> Please provide some guidance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SOLR-7509) Solr Multilingual Indexing with one field

Reply via email to