raghu298 opened a new issue, #195:
URL: https://github.com/apache/uima-ruta/issues/195

   We are getting OOM issues for one of the ruta scripts and text.
   On part of this we tried enabling the flag  simpleGreedyForComposed and 
tried out and saw our existing test cases failing.
   
   1. One of the batch of texts test case getting stuck and no movement.
   2. one of the text(which has 13 email ids) with attached email ruta is 
getting below error:
   
   java.lang.NullPointerException: Cannot invoke "java.util.List.size()" 
because "match" is null
   at 
org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:387)
 ~[ruta-core-3.3.0.jar:?]
   at 
org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:479)
 ~[ruta-core-3.3.0.jar:?]
   at 
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:112)
 ~[ruta-core-3.3.0.jar:?]
   at 
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79)
 ~[ruta-core-3.3.0.jar:?]
   at 
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79)
 ~[ruta-core-3.3.0.jar:?]
   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:72) 
~[ruta-core-3.3.0.jar:?]
   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:63) 
~[ruta-core-3.3.0.jar:?]
   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:42) 
~[ruta-core-3.3.0.jar:?]
   at org.apache.uima.ruta.block.RutaScriptBlock.apply(RutaScriptBlock.java:74) 
~[ruta-core-3.3.0.jar:?]
   
   Email ruta is:
   
   DocumentAnnotation{-> RETAINTYPE(SPACE)};
   
    ((W|NUM) (W|NUM)[0,1])+ "@" W+? PERIOD+? 
W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,5)};
   
    ((W|NUM) (W|NUM)[0,1])+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+? 
W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,7)};
   
    ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W+? PERIOD+? 
W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,6)};
   
    ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+? 
W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,8)}
   
   masked text is:
   
   us...@example.com is my email id or us...@example.com or us...@example.com 
or us...@example.com or us...@example.com or us...@example.com.hi my name is 
Joe. My email id is us...@example.com.  My email id is us...@example.com. My 
email id is us...@example.com. My email id is also user10@example.comHi my new 
email id is use...@example.com and my alternate email ids are 
use...@example.com, use...@example.com, use...@example.com
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uima.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to