Kostas V. wrote:
I have the Analyzers for both languages (they do stemming as well) but I
don't know how to use them together. I imagine that I have to do two passes
for each paper  ?? or this is not correct?
The following line is how I use my English Analyzer

IndexWriter writer = new IndexWriter(indexPath,new PorterStemAnalyzer() ,
true);

And this about the Greek

IndexWriter writer = new IndexWriter(indexPath,new GreekAnalyzer() , true);

Is it possible?
And when I make the search, how the program can use both Analyzers as well?
They told me to make a mixed Analyzer but I don't know if this is possible.

The general idea would be to make an analyser which chooses which analyser to pass the text to. In general this would be rather difficult, but in your particular situation, Greek and English use different alphabets so it may not be too hard.

Having a quick look at the GreekAnalyzer, it still uses the StandardTokenizer. And it looks like the filters that are being used for this and the English analyser wouldn't interfere with each other either. So you could probably make an analyser which performs both, something like this:

  public class CombinedAnalyser extends Analyzer {
    private GreekAnalyzer greek = new GreekAnalyzer();
    public TokenStream tokenStream(String fieldName, Reader reader) {
      // Filters greek
      TokenStream tokens = greek.tokenStream(fieldName, reader);

      // Filters english
      tokens = new StandardFilter(tokens);
      tokens = new LowerCaseFilter(tokens);
      tokens = new StopFilter(tokens);
      tokens = new PorterStemFilter(tokens);

      return tokens;
    }
  }

Another way to go about it would be to detect the greek fragments of the text up-front and pass those fragments through the greek analyser, and anything else through the other analyser.

Daniel


--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to