[ 
https://issues.apache.org/jira/browse/LUCENE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930814#action_12930814
 ] 

M Alexander commented on LUCENE-2745:
-------------------------------------

Quick question - how difficult is it to make the new StandardTokenizer 
(branch_3X) with its new capabilities (including properly tokenizing Arabic as 
well as identifying email addresses, hostnames, etc) to work with version 2.9.2?

Is it very difficult, or would it only require copying across few classes and 
minor tweaks?

> ArabicAnalyzer - the ability to recognise email addresses host names and so on
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2745
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2745
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
>         Environment: All
>            Reporter: M Alexander
>
> The ArabicAnalyzer does not recognise email addresses, hostnames and so on. 
> For example,
> [email protected]
> will be tokenised to [adam] [hotmail] [com]
> It would be great if the ArabicAnalyzer can tokenises this to 
> [[email protected]]. The same applies to hostnames and so on.
> Can this be resolved? I hope so
> Thanks
> MAA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to