Re: indexing rss feeds in multiple languages

Doron Cohen Wed, 21 Mar 2007 23:00:58 -0800

If language is known also at search time, PerFieldAnalyzerWrapper seems a
nice third option: single document per feed, with a separate field for each
language, additional field(s) for the common data;  using
PerFieldAnalyzerWrapper at both indexing and search;  using FieldSelector
at search to retrieve only the relevant field(s) for matched documents.
(never done this myself though.)
- Doron


"Melanie Langlois" <[EMAIL PROTECTED]> wrote on 21/03/2007
23:03:03:

> Hi,
>
>
>
> I saw that there are many post on the mailing list about indexing in
> multiple language, so I will try to not post duplicate question. In
> my case, I want to index rss feeds, so one feed contains several
> items in different languages, and some common data for all the items
> (date, source..).  After reading the different posts, I think I will
> create a document per item, index them in the same index using each
> time a language specific analyzer, and store lang field for specific
> search. But I'm wondering how I should handle the common fields, it
> seems I have two options:
>
> 1 : store the common data in each item. What happen if duplicate
> information are entered, are they duplicate in the index ?
>
>
>
> 2 : create a separate document for the common data. In this case I
> will need to link these data to all underlying items storing some
> ids. The issue is that I would need to search the index twice if the
> search is done only per date, because I would need to retrieve the
> items contents.
>
>
>
> Thank in advance for your help.
>
>
>
> Mélanie
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: indexing rss feeds in multiple languages

Reply via email to