Hi Jack,
I do not get exception before changing data files. And I do not get exception
after changing data files and creating lucene-icu...jar by ant.
But changing data files and running ant does not change the output.
So I decided to manually create .nrm file by using steps outlined in the
build.xml file
<property name="gennorm2.src.files"
value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt
DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt
NativeDigitFolding.txt"/>
<property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
<property name="gennorm2.dst"
value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>
<target name="gennorm2" depends="gen-utr30-data-files">
<echo>Note that the gennorm2 and icupkg tools must be on your PATH. These
tools
are part of the ICU4C package. See http://site.icu-project.org/ </echo>
<mkdir dir="${build.dir}/gennorm2"/>
<exec executable="gennorm2" failonerror="true">
<arg value="-v"/>
<arg value="-s"/>
<arg value="${utr30.data.dir}"/>
<arg line="${gennorm2.src.files}"/>
<arg value="-o"/>
<arg value="${gennorm2.tmp}"/>
</exec>
<!-- now convert binary file to big-endian -->
<exec executable="icupkg" failonerror="true">
<arg value="-tb"/>
<arg value="${gennorm2.tmp}"/>
<arg value="${gennorm2.dst}"/>
</exec>
<delete file="${gennorm2.tmp}"/>
</target>
namely
gennorm2 -v -s src/data/utr30 nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt
DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt
NativeDigitFolding.txt -o utr30.tmp
icupkg -tb utr30.tmp utr30.nrm
then I unpacked lucene-icu...jar file, replaced .nrm file and created new jar
file using jar cf
Solr gives error if I use this new .jar file
What I noticed was that ant task actually does not run gennorm2 task.
If I delete gennrom2 entiry from build.xml file utr30nrm still gets created by
ant task. I have deleted even these lines
<target name="compile-core" depends="jar-analyzers-common,
common.compile-core" />
<property name="utr30.data.dir" location="src/data/utr30"/>
<target name="gen-utr30-data-files" depends="compile-tools">
<java
classname="org.apache.lucene.analysis.icu.GenerateUTR30DataFiles"
dir="${utr30.data.dir}"
fork="true"
failonerror="true">
<classpath>
<path refid="icujar"/>
<pathelement location="${build.dir}/classes/tools"/>
</classpath>
</java>
</target>
it still gets created. So, I wondered how ant creates it?
icu support team wrote that they do not have any mappings.
I mean mappings between diacritic letters and latin letters.
Thanks.
Alex.
-----Original Message-----
From: Jack Krupansky <[email protected]>
To: java-user <[email protected]>
Sent: Fri, Feb 14, 2014 5:13 pm
Subject: Re: char mapping in lucene-icu
Do you get the exception if you run ant before changing the data files?
"Header authentication failed, please check if you have a valid ICU data
file"
Check with the ICU project as to the proper format for THEIR files. I mean,
this doesn't sound like a Lucene issue.
Maybe it could be as simple as whether the data file should have DOS or UNIX
or Mac line endings (CRLF vs. NL vs. CR.) Be sure to use an editor that
satisfies the requirements of ICU.
To be clear, Lucene itself does not have a published API for modifying the
mappings of ICU.
-- Jack Krupansky
-----Original Message-----
From: [email protected]
Sent: Friday, February 14, 2014 7:48 PM
To: [email protected]
Subject: char mapping in lucene-icu
Hello,
I try to use lucene-icu li in solr-4.6.1. I need to change a char mapping
in lucene-icu. I have made changes
to
lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
and built jar file using ant , but it did not help.
I took a look to lucene/analysis/icu/build.xml and see these lines
<property name="gennorm2.src.files"
value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt
DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt"/>
<property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
<property name="gennorm2.dst"
value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>
<target name="gennorm2" depends="gen-utr30-data-files">
<echo>Note that the gennorm2 and icupkg tools must be on your PATH.
These tools
are part of the ICU4C package. See http://site.icu-project.org/ </echo>
<mkdir dir="${build.dir}/gennorm2"/>
<exec executable="gennorm2" failonerror="true">
<arg value="-v"/>
<arg value="-s"/>
<arg value="${utr30.data.dir}"/>
<arg line="${gennorm2.src.files}"/>
<arg value="-o"/>
<arg value="${gennorm2.tmp}"/>
</exec>
<!-- now convert binary file to big-endian -->
<exec executable="icupkg" failonerror="true">
<arg value="-tb"/>
<arg value="${gennorm2.tmp}"/>
<arg value="${gennorm2.dst}"/>
</exec>
<delete file="${gennorm2.tmp}"/>
</target>
looks like ant does not execute gennorm2. If I build utr30.nrm file using
gennorm2 manually
and replacing utr30.nrm in the jar file then starting solr gives the
following error.
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file
error: Header authentication failed, please check if you have a valid ICU
data file
My questions are;
1. if the above code in the build file does not get executed then how the
utr30 file is generated?
2. How to change a character mapping.
Thanks.
Alex.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]