[ 
https://issues.apache.org/jira/browse/LUCENE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667837#action_12667837
 ] 

Luis Alves commented on LUCENE-1528:
------------------------------------

Hi Michael,

I checked the book "Generating parser with JavaCC" and I checked the javacc 
website (https://javacc.dev.java.net/doc/javaccgrm.html)
for grammar, here is the syntax for a character list:

character_list  ::=     [ "~" ] "[" [ character_descriptor ( "," 
character_descriptor )* ] "]"
character_descriptor    ::=     java_string_literal [ "-" java_string_literal ]

also the '|' character in javacc syntax is used like an XOR, and there is no OR 
or AND operator to be used in the javacc syntax that I'm aware.
So the expression <_WHITESPACE> | [ "+", ... ]  would have to look like 
~(<_WHITESPACE> & [ "+", ... ]) but this is not possible in javacc grammar.

So I think the best option for now, is to keep the current syntax.

If you like, I can change 

<#_WHITESPACE: ( " " | "\t" | "\n" | "\r") >

to a character_list to make it more consistent, but that would not help to 
remove the duplicated list of characters.

<#_WHITESPACE: [ " ", "\t", "\n", "\r" ] >



> Add support for Ideographic Space to the queryparser - also know as fullwith 
> space and wide-space
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1528
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1528
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4.1
>            Reporter: Luis Alves
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.4.1
>
>         Attachments: lucene_wide_space_v1_src.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The Ideographic Space is a space character that is as wide as a normal CJK 
> character cell.
> It is also known as wide-space or fullwith space.This type of space is used 
> in CJK languages.
> This patch adds support for the wide space, making the queryparser component 
> more friendly
> to queries that contain CJK text.
> Reference:
> 'http://en.wikipedia.org/wiki/Space_(punctuation)' - see Table of spaces, 
> char U+3000.
> I also added a new testcase that fails before the patch.
> After the patch is applied all junits pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to