The ConstantExtractor is not the one responsible for the text indexing. I must admit it is a little confusing, but if you only want the dutch part to be text indexed, you change the XMLContentExtractor configuration. Take a look at [1], it is all explained there.
Also note, that for just reindexing, you do not need to touch all documents again. It is only when you have changed an extractor that actually sets a property (the constantExtractor does by the way, but not the XMLContentExtractor (this is confusing, I know)). But, what you should do AFAICS, is just keep the extractors the way they are, and just change the target/scope you are searching from in the frontend: the frontend knows wether to search in the 'en' or 'nl' part. Just adjust the search scope, and you need to change nothing in the extractors. -Ard > > Thank you for these useful tips. > I'm starting to use two different branch, one for english one > for dutch. > But I encountered the following problem. > I change URI attribute in all my extractors, from > > <extractor classname="nl.hippo.slide.extractor.ConstantExtractor" > uri="/files" content-type="text/xml"> > to > <extractor classname="nl.hippo.slide.extractor.ConstantExtractor" > uri="/files/default.preview/content/nl" content-type="text/xml"> > > I removed the directory that contains "slide_index", then I > restarted the REPO. I verified the regeneration of all index. > (I also used the RepoTouch tool) But, when I search my > document I find also document in the EN branch > (/files/default.preview/content/en). > Why? > Is very strange, beacause I change correctly all my extractors. > How Can I debug this situation? > > thanks in advance, > Alessandro > > > 2008/5/5 Ard Schrijvers <[EMAIL PROTECTED]>: > > Hello Alessandro, > > > > If you distinguish your content hierarchically, why would > you need > > different extractors? Normally, when we have multiple > languages, and > > it is seperated by structure, all you need, is to account > for it in > > the frontend you are using. > > > > If you have multiple languages within one document, you can take a > > look at [1] > > > > Regards Ard > > > > [1] > > > > > http://www.hippocms.org/display/CMS/Hippo+Repository+ConfigurableXMLCo > > nt > > entExtractor > > > > > > > > > > > > I'm adding multi-language in an existing site. > > > The site was built with Hippo CMS and Hippo repository. > > > > > > I'm focusing on the repository: > > > > > > The original structure contains only this branch: > > > > > > /default/files/default.preview/content/nl > > > > > > and I've just added a new English branch > > > > /default/files/default.preview/content/en > > > > > > The actual extractors index all the content because is like this: > > > <extractor > > > classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor" > > > uri="/files" content-type="text/xml"> > > The attribute > > uri="/files" is too generic therefore the > property like this: > > > > > > <instruction property="title" namespace="http://hippo.nl/cms/1.0" > > > xpath="/document/content/general-item/title"/> > > > > > > contains english and dutch element. > > > > > > Moreover, I found in indexer.xml several <property> that use > > > org.apache.lucene.analysis.nl.DutchAnalyzer. > > > > > > Hope someone can help me with some tips on > multi-languages repository. > > > > > > Thanks in advance > > > > > > Alessandro > > > ******************************************** > > > Hippocms-dev: Hippo CMS development public mailinglist > > > ******************************************** > > Hippocms-dev: Hippo CMS development public mailinglist > > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist
