I got the same warning when I compiled the patch. I haven't tried my
patch with the patch for Bug 30844 (or the latest CVS) to see if it
removes the warning. I assume that would fix the problem, but I haven't
tested that theory out. I'll get around to that after I finish my
current work (which uses Lucene to index Japanese documents) under a
looming deadline. :-)
Joey
On Tuesday, September 7, 2004, at 01:19 PM, Daniel Naber wrote:
On Monday 23 August 2004 13:46, Joey Lawrance wrote:
I've attached the HTMLParser.jj file that successfully parses Japanese
HTML for indexing.
Joey,
thanks for the patch. When I compile it with "ant javacc-HTMLParser" I
get
this warning:
"Warning: Line 364, Column 3: Non-ASCII characters used in regular
expression.
Please make sure you use the correct Reader when you create the parser
that
can handle your character set."
Is it okay to get this warning? The line the warning refers to is this
one:
| < CJK: // non-alphabets
Besides that, the patch seems to work, i.e. the parser doesn't stop on
Japanese HTML files anymore, but that's all I can say, as I don't speak
Japanese.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]