[ https://issues.apache.org/jira/browse/LUCENE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667837#action_12667837 ]
Luis Alves commented on LUCENE-1528: ------------------------------------ Hi Michael, I checked the book "Generating parser with JavaCC" and I checked the javacc website (https://javacc.dev.java.net/doc/javaccgrm.html) for grammar, here is the syntax for a character list: character_list ::= [ "~" ] "[" [ character_descriptor ( "," character_descriptor )* ] "]" character_descriptor ::= java_string_literal [ "-" java_string_literal ] also the '|' character in javacc syntax is used like an XOR, and there is no OR or AND operator to be used in the javacc syntax that I'm aware. So the expression <_WHITESPACE> | [ "+", ... ] would have to look like ~(<_WHITESPACE> & [ "+", ... ]) but this is not possible in javacc grammar. So I think the best option for now, is to keep the current syntax. If you like, I can change <#_WHITESPACE: ( " " | "\t" | "\n" | "\r") > to a character_list to make it more consistent, but that would not help to remove the duplicated list of characters. <#_WHITESPACE: [ " ", "\t", "\n", "\r" ] > > Add support for Ideographic Space to the queryparser - also know as fullwith > space and wide-space > ------------------------------------------------------------------------------------------------- > > Key: LUCENE-1528 > URL: https://issues.apache.org/jira/browse/LUCENE-1528 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4.1 > Reporter: Luis Alves > Assignee: Michael Busch > Priority: Minor > Fix For: 2.4.1 > > Attachments: lucene_wide_space_v1_src.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > The Ideographic Space is a space character that is as wide as a normal CJK > character cell. > It is also known as wide-space or fullwith space.This type of space is used > in CJK languages. > This patch adds support for the wide space, making the queryparser component > more friendly > to queries that contain CJK text. > Reference: > 'http://en.wikipedia.org/wiki/Space_(punctuation)' - see Table of spaces, > char U+3000. > I also added a new testcase that fails before the patch. > After the patch is applied all junits pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org