I just deleted the "ParseException.java" file, and everything works fine now.
Thank you. Gilles Moyse -----Message d'origine----- De : Erik Hatcher [mailto:[EMAIL PROTECTED] Envoy� : samedi 11 octobre 2003 06:04 � : Lucene Users List Objet : Re: StandardTokenizer Problem Look at Lucene's build file (the one in CVS) and how it deals with this situation. It does this: <target name="javacc-StandardAnalyzer" depends="init,javacc-check" if="javacc.present"> <!-- generate this in a build directory so we can exclude ParseException --> <mkdir dir="${build.dir}/gen/org/apache/lucene/analysis/standard"/> <antcall target="invoke-javacc"> <param name="target" location="src/java/org/apache/lucene/analysis/standard/ StandardTokenizer.jj"/> <param name="output.dir" location="${build.dir}/gen/org/apache/lucene/analysis/standard"/> </antcall> <copy todir="src/java/org/apache/lucene/analysis/standard"> <fileset dir="${build.dir}/gen/org/apache/lucene/analysis/standard"> <include name="*.java"/> <exclude name="ParseException.java"/> </fileset> </copy> </target> which ignores the ParseException that was generated. On Friday, October 10, 2003, at 07:48 AM, MOYSE Gilles (Cetelem) wrote: > Hi all. > > I need to define my own tokenizer so as to detect accentuated > characters. > So as not to modify the Lucene classes, I made a copy of the > StandardTokenizer.jj in another package. > Then, I modified the names (StandardTokenizer becomes MICTokenizer, > MIC is > the name of my appli). > After a JavaCC generation, I obtain the following error while compiling > MICTokenizer.java : > > Exception ParseException is not compatible with throws clause in > org.apache.lucene.analysis.TokenStream.next() > > The definition of the next() method in TokenStream is as follow : > abstract public Token next() throws IOException; > > whereas JavaCC, from the MICTokenizer.jj generates the following next() > method in MICTokenizer.java > final public org.apache.lucene.analysis.Token next() throws > ParseException, IOException > > The compiler seems right : the method next() in MICTokenizer.java does > not > have the same structure. > > But why does this error raises with my class MICTokenizer.java > generated > from MICTokenizer.jj, while the StandardTokenizer.java generated from > StandardTokenizer.jj works well, with the same next() method signature > : > final public org.apache.lucene.analysis.Token next() throws > ParseException, IOException > > If I remove the "ParseException" in the method signature, the compiler > complains about the jj_consume_token methods called within the > function who > throw "ParseException"s. > > > Any help welcome. > > Thanks > > Gilles Moyse > > Here is a copy of my MICTokenizer.jj (for those interested, you can > find a > unicode decomposition with accents at the end of the code) : > > /* ==================================================================== > * The Apache Software License, Version 1.1 > * > * Copyright (c) 2001 The Apache Software Foundation. All rights > * reserved. > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions > * are met: > * > * 1. Redistributions of source code must retain the above copyright > * notice, this list of conditions and the following disclaimer. > * > * 2. Redistributions in binary form must reproduce the above copyright > * notice, this list of conditions and the following disclaimer in > * the documentation and/or other materials provided with the > * distribution. > * > * 3. The end-user documentation included with the redistribution, > * if any, must include the following acknowledgment: > * "This product includes software developed by the > * Apache Software Foundation (http://www.apache.org/)." > * Alternately, this acknowledgment may appear in the software > itself, > * if and wherever such third-party acknowledgments normally appear. > * > * 4. The names "Apache" and "Apache Software Foundation" and > * "Apache Lucene" must not be used to endorse or promote products > * derived from this software without prior written permission. For > * written permission, please contact [EMAIL PROTECTED] > * > * 5. Products derived from this software may not be called "Apache", > * "Apache Lucene", nor may "Apache" appear in their name, without > * prior written permission of the Apache Software Foundation. > * > * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED > * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES > * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE > * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR > * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > * SUCH DAMAGE. > * ==================================================================== > * > * This software consists of voluntary contributions made by many > * individuals on behalf of the Apache Software Foundation. For more > * information on the Apache Software Foundation, please see > * <http://www.apache.org/>. > */ > > options { > STATIC = false; > //IGNORE_CASE = true; > //BUILD_PARSER = false; > //UNICODE_INPUT = true; > USER_CHAR_STREAM = true; > OPTIMIZE_TOKEN_MANAGER = true; > //DEBUG_TOKEN_MANAGER = true; > } > PARSER_BEGIN(MICTokenizer) > > package com.cetelem.outildecisionnel.mic.analysis.tokenizers; > > import java.io.*; > import org.apache.lucene.analysis.standard.FastCharStream; > > /** A grammar-based tokenizer constructed with JavaCC. > * > * <p> This should be a good tokenizer for most European-language > documents. > * > * <p>Many applications have specific tokenizer needs. If this > tokenizer > does > * not suit your application, please consider copying this source code > * directory to your project and maintaining your own grammar-based > tokenizer. > * Now supports accents (return a HAS_ACCENT token type) and integers > (NUMBER token) > */ > public class MICTokenizer extends org.apache.lucene.analysis.Tokenizer > { > > /** Constructs a tokenizer for this Reader. */ > public MICTokenizer(Reader reader) { > this(new FastCharStream(reader)); > this.input = reader; > } > } > > PARSER_END(MICTokenizer) > > TOKEN : { // token patterns > > // number > <NUMBER: (<DIGIT>)+> > > | <HAS_ACCENT: // at least one digit > (<LETTER>)* > <ACCENTUATED_LETTER> > (<LETTER>)* >> > > // basic word: a sequence of digits & letters > | <ALPHANUM: (<LETTER>|<DIGIT>)+ > > > // internal apostrophes: O'Reilly, you're, O'Reilly's > // use a post-filter to remove possesives > | <APOSTROPHE: <ALPHA> ("'" <ALPHA>)+ > > > // acronyms: U.S.A., I.B.M., etc. > // use a post-filter to remove dots > | <ACRONYM: <ALPHA> "." (<ALPHA> ".")+ > > > // company names like AT&T and [EMAIL PROTECTED] > | <COMPANY: <ALPHA> ("&"|"@") <ALPHA> > > > // email addresses > | <EMAIL: <ALPHANUM> "@" <ALPHANUM> ("." <ALPHANUM>)+ > > > // hostname > | <HOST: <ALPHANUM> ("." <ALPHANUM>)+ > > > // floating point, serial, model numbers, ip addresses, etc. > // every other segment must have at least one digit > | <NUM: (<ALPHANUM> <P> <HAS_DIGIT> > | <HAS_DIGIT> <P> <ALPHANUM> > | <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+ > | <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+ > | <ALPHANUM> <P> <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+ > | <HAS_DIGIT> <P> <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+ > ) >> > > | <#P: ("_"|"-"|"/"|"."|",") > > > | <#HAS_DIGIT: // at least one digit > (<LETTER>|<DIGIT>)* > <DIGIT> > (<LETTER>|<DIGIT>)* >> > > | < #ALPHA: (<LETTER>)+> > > | < #LETTER: (<NON_ACCENTUATED_LETTER>|<ACCENTUATED_LETTER>) > > > | < #NON_ACCENTUATED_LETTER: // unicode > letters > [ > "\u0041"-"\u005a", //upper case (A-Z) > "\u0061"-"\u007a", //lower case (a-z) > "\u0100"-"\u1fff", // the following letters may be conidered as > accentuated, but they dont exist in Latin languages > "\u3040"-"\u318f", > "\u3300"-"\u337f", > "\u3400"-"\u3d2d", > "\u4e00"-"\u9fff", > "\uf900"-"\ufaff" > ] >> > | < #ACCENTUATED_LETTER: // unicode > letters > [ > "\u00c0"-"\u00c5", //accentuated A > "\u00c6", //AE > "\u00c6", //C cedille > "\u00c8"-"\u00cb", //accentuated E > "\u00cc"-"\u00cf", //accentuated I > "\u00d1", //N tilde > "\u00d2"-"\u00d6", //accentuated O > "\u00d9"-"\u00dc", //accentuated U > "\u00dd", //accentuated Y > "\u00e0"-"\u00e5", //accentuated a > "\u00e8"-"\u00eb", //accentuated e > "\u00cc"-"\u00cf", //accentuated I > "\u00d1", //N tilde > "\u00d2"-"\u00d6", //accentuated O > "\u00d9"-"\u00dc", //accentuated U > "\u00dd", //accentuated Y > "\u00e0"-"\u00e5", //accentuated a > "\u00e6", //ae > "\u00e6", //c cedille > "\u00e8"-"\u00eb", //accentuated e > "\u00ec"-"\u00ef", //accentuated i > "\u00f1", //N tilde > "\u00f2"-"\u00f6", //accentuated o > "\u00f9"-"\u00fc", //accentuated U > "\u00fd"-"\u00ff" //accentuated y > ] >> > | < #DIGIT: // unicode digits > [ > "\u0030"-"\u0039", > "\u0660"-"\u0669", > "\u06f0"-"\u06f9", > "\u0966"-"\u096f", > "\u09e6"-"\u09ef", > "\u0a66"-"\u0a6f", > "\u0ae6"-"\u0aef", > "\u0b66"-"\u0b6f", > "\u0be7"-"\u0bef", > "\u0c66"-"\u0c6f", > "\u0ce6"-"\u0cef", > "\u0d66"-"\u0d6f", > "\u0e50"-"\u0e59", > "\u0ed0"-"\u0ed9", > "\u1040"-"\u1049" > ] >> > } > > SKIP : { // skip unrecognized chars > <NOISE: ~[] > > } > > /** Returns the next token in the stream, or null at EOS. > * <p>The returned token's type is set to an element of [EMAIL PROTECTED] > * MICTokenizerConstants#tokenImage}. > */ > org.apache.lucene.analysis.Token next() throws IOException : > { > Token token = null; > } > { > ( token = <ALPHANUM> | > token = <APOSTROPHE> | > token = <ACRONYM> | > token = <COMPANY> | > token = <EMAIL> | > token = <HOST> | > token = <NUM> | > token = <NUMBER> | > token = <HAS_ACCENT> | > token = <EOF> > ) > { > if (token.kind == EOF) { > return null; > } else { > return > new org.apache.lucene.analysis.Token(token.image, > token.beginColumn,token.endColumn, > tokenImage[token.kind]); > } > } > } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
