Hello

I'm have problem with search system. I'm try to search
with thai language in custom document type. I compile
lucene-analyzer-2.0 to support ThaiAnalyzer and
checkout org.nuxeo.project.sample to start testing.
Then modified "search-contrib.xml" to support fulltext
search.

"search-contrib.xml"

<component name="org.nuxeo.project.sample.search">

  <extension
   
target="org.nuxeo.ecm.core.search.service.SearchServiceImpl"
    point="indexableDocType">
    <indexableDocType indexAllSchemas="true"
name="Sample" />
  </extension>
  <extension
   
target="org.nuxeo.ecm.core.search.service.SearchServiceImpl"
    point="resource">
    <resource name="sample" prefix="sample"
type="schema"
      indexAllFields="true">
      <field name="sample1" analyzer="default"
stored="true"
        indexed="true" type="Text" binary="false" />
      <field name="sample2" analyzer="default"
stored="true"
        indexed="true" type="Text" binary="false"
sortable="true" />
    </resource>
  </extension>
  <extension
   
target="org.nuxeo.ecm.core.search.service.SearchServiceImpl"
    point="fullTextField">
    <fullText name="ecm:fulltext" analyzer="default"
      blobExtractorName="nuxeoTransform">
      <field>dublincore:title</field>
      <field>dublincore:description</field>
      <field>file:content</field>
      <field>note:note</field>
      <field>sample:sample1</field>
      <field>sample:sample2</field>
      <mimetype
name="application/pdf">pdf2text</mimetype>
      <mimetype name=".*/.*">any2text</mimetype>
    </fullText>
  </extension>
  <extension
target="org.nuxeo.ecm.core.search.service.SearchServiceImpl"
    point="searchEngineBackend">

    <searchEngineBackend name="compass" default="true"
     
class="org.nuxeo.ecm.core.search.backend.compass.CompassBackend">
     
<configurationFileName>/sample.cfg.xml</configurationFileName>
    </searchEngineBackend>

  </extension>
</component>

As you can see that I have to add
"dublincore:title,dublincore:description,file:content,note:note"
to fulltextField. If I remove those fields, I can't
search "fulltext" with those fields. This is bug or
not?

To apply ThaiAnalyzer, I config compass with
"sample.cfg.xml" and "sample.cpm.xml"

"sample.cfg.xml"
<?xml version="1.0"?>
<compass-core-config
 
xmlns="http://www.opensymphony.com/compass/schema/core-config";
 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
 
xsi:schemaLocation="http://www.opensymphony.com/compass/schema/core-config
          
http://www.opensymphony.com/compass/schema/compass-core-config.xsd";>

  <compass name="default">

    <connection>
    <!-- IMPORTANT. This connection setting is a
sample.
    It will be overridden by the compass backend's
extension point for
    NXRuntime. To avoid overriding it the latest
contribution to the
    extension point should be:

     <extension
target="org.nuxeo.ecm.core.search.backend.compass.CompassBackend"
             point="connection">
      <default/>
    </extension>
     -->

      <jdbc managed="true"
       
dialectClass="org.apache.lucene.store.jdbc.dialect.HSQLDialect"
        deleteMarkDeletedDelta="3600000">
        <dataSourceProvider>
          <jndi lookup="java:/nxsearch-compass" />
        </dataSourceProvider>
      </jdbc>
    </connection>

    <transaction commitBeforeCompletion="true"
     
factory="org.compass.core.transaction.JTASyncTransactionFactory">
      <batchInsertSettings maxBufferedDocs="100" />
    </transaction>

    <converters>
      <converter name="date"
       
type="org.compass.core.converter.basic.CalendarConverter"
/>
      <converter name="int"
       
type="org.compass.core.converter.basic.IntConverter">
        <setting name="format" value="#0000000000" />
      </converter>
      <converter name="long"
       
type="org.compass.core.converter.basic.LongConverter">
        <setting name="format"
value="#00000000000000000000" />
      </converter>
    </converters>

    <searchEngine>
      <analyzer name="lowerWhitespace"
type="CustomAnalyzer"
       
analyzerClass="org.nuxeo.ecm.core.search.backend.compass.LowerWhitespaceAnalyzer"
/>
      <analyzer name="french" type="CustomAnalyzer"
       
analyzerClass="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
      <analyzer name="thai" type="CustomAnalyzer"
       
analyzerClass="org.apache.lucene.analysis.th.ThaiAnalyzer"/>
      <optimizer scheduleInterval="300"
schedule="true"/>
    </searchEngine>

    <mappings>
      <resource location="sample.cpm.xml" />
    </mappings>

  </compass>
</compass-core-config>

"sample.cpm.xml"
<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
    "-//Compass/Compass Core Mapping DTD 1.0//EN"
   
"http://www.opensymphony.com/compass/dtd/compass-core-mapping.dtd";>

<compass-core-mapping>

  <!--  Alias for Nuxeo Documents
    Customizers, don't change the alias name, nor the
resource-id name, and
    don't try and use another alias
   -->
      <!--  all is useless with Nuxeo mixed type
contents -->
  <resource alias="nxdoc" sub-index="nxdocs"
analyzer="default" all="false">
    <resource-id name="nxdoc_id" />

    <resource-property name="dc:title" analyzer="thai"
/>
    <resource-property name="dc:description"
analyzer="thai" />
    <resource-property name="sample:sample1"
analyzer="thai" />
    <resource-property name="sample:sample2"
analyzer="thai" />

    <!-- BUILTIN -->
    <resource-property name="ecm:isCheckedInVersion"
converter="boolean"
      store="yes" index="un_tokenized" />

      <resource-property name="ecm:isProxy"
converter="boolean"
      store="yes" index="un_tokenized" />


    <!-- DUBLINCORE -->
    <resource-property name="dc:created"
converter="date" store="yes"
      index="un_tokenized" />
    <resource-property name="dc:modified"
converter="date" store="yes"
      index="un_tokenized" />
    <resource-property name="dc:issued"
converter="date" store="yes"
      index="un_tokenized" />
    <resource-property name="dc:valid"
converter="date" store="yes"
      index="un_tokenized" />
    <resource-property name="dc:expired"
converter="date" store="yes"
      index="un_tokenized" />

    <!-- COMMON -->
    <resource-property name="common:size"
converter="int" store="yes"
      index="un_tokenized" />

    <!-- UID -->
    <!-- long is a shame for these small integers, but
that's for
      consistency with what a DocumentModel holds (JCR
doesn't know about shorts)
    -->
    <resource-property name="uid:major_version"
converter="long" store="yes"
      index="un_tokenized" />
    <resource-property name="uid:minor_version"
converter="long" store="yes"
      index="un_tokenized" />

  </resource>

  <!-- TODO move this to another file (and make
possible to contribute
  several of them -->
  <!--  TODO fill in properties that need it -->
  <resource alias="nxrel-default" sub-index="nxrels"
    analyzer="default" all="false">
    <resource-id name="nxdoc_id" />
    <!-- TODO use a better global name for id props
-->
  </resource>

  <resource alias="nxrel-documentComments"
sub-index="nxrels"
    analyzer="default" all="false">
    <resource-id name="nxdoc_id" />
  </resource>

</compass-core-mapping>

After I add ThaiAnalyzer to compass and set
"dc:title","dc:description","sample:sample1","sample:sample2"
to use ThaiAnalyzer. Then I deploy this component and
ReIndex nuxeo. But result still not tokenize the thai
word (e.x. title="»ÃÐâ¤ÇèÒ". token [»ÃÐâ¤] [ÇèÒ] if
I try to search "»ÃÐâ¤", It should show the document
with title="»ÃÐâ¤ÇèÒ")
Did I do something wrong? Can you suggest what should
I do with this issue?

Now I translaing message to Thai language. It will be
helpful if message.properties support Utf. Because I
have to use "\uxxxx" in message.properties to display
Thai Interface. I also found bug in "New Document"
page after you selected document type. This page use
"command.save" text as display in other page (like
"Modify Document Tab").But It display wrong character
so I can't read it.

Sorry for my weak english

Best regards



       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/
_______________________________________________
ECM mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm

Reply via email to