Can you help me optimize the attached ruta script? blocked with the script causing issue
On Wed, Apr 23, 2025 at 6:05 PM Raghunath Mahakud < mahakudraghunath...@gmail.com> wrote: > Hi Peter, > > We are getting OOM issues for one of the ruta scripts and text. > On part of this we tried enabling the flag simpleGreedyForComposed and > tried out and saw our existing test cases failing. > > 1. One of the batch of texts test cas getting stuck and no movement. > 2. one of the text(which has 13 email ids) with attached email ruta is > getting below error: > > java.lang.NullPointerException: Cannot invoke "java.util.List.size()" > because "match" is null > at > org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:387) > ~[ruta-core-3.3.0.jar:?] > at > org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:479) > ~[ruta-core-3.3.0.jar:?] > at > org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:112) > ~[ruta-core-3.3.0.jar:?] > at > org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79) > ~[ruta-core-3.3.0.jar:?] > at > org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79) > ~[ruta-core-3.3.0.jar:?] > at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:72) > ~[ruta-core-3.3.0.jar:?] > at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:63) > ~[ruta-core-3.3.0.jar:?] > at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:42) > ~[ruta-core-3.3.0.jar:?] > at > org.apache.uima.ruta.block.RutaScriptBlock.apply(RutaScriptBlock.java:74) > ~[ruta-core-3.3.0.jar:?] > > Email ruta is: > > DocumentAnnotation{-> RETAINTYPE(SPACE)}; > > ((W|NUM) (W|NUM)[0,1])+ "@" W+? PERIOD+? > W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,5)}; > > ((W|NUM) (W|NUM)[0,1])+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+? > W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,7)}; > > ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W+? PERIOD+? > W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,6)}; > > ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+? > W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,8)} > > Looks like the flag is still buggy in the latest version of uima ruta? > > Or something wrong i am doing after enabling the flag looks many errors. > > Regards, > > Raghunath... > > > > >
DECLARE EntityType (STRING entityType); DocumentAnnotation{-> RETAINTYPE(SPACE,BREAK)}; ((NUM (SPECIAL NUM)?)|EntityType{FEATURE("entityType", "amount")}) (COMMA|SPACE|BREAK)* ((W|NUM) (SPACE | PERIOD)?)* (COMMA|SPACE|BREAK)* (((W|NUM) (SPACE | PERIOD)?)*(COMMA|SPACE|BREAK)*) ((EntityType+{FEATURE("entityType", "location_indicator")} | (NUM{REGEXP(".....")} ("-" NUM{REGEXP("....")})?)) (COMMA|SPACE|BREAK)*)+ {-> MARK(EntityType,1,8)}; NUM+ // 123-1 (SPECIAL NUM+)? SPACE* // Street lane road ((W|NUM) (SPACE | PERIOD)?)* (COMMA|SPACE)* // City (W SPACE?)+ (COMMA|SPACE)* // state (W SPACE?)+ (COMMA|SPACE)* // pincode NUM (COMMA|SPACE)* W?{REGEXP("(?i)(USA|US|CANADA)") ->MARK(EntityType,1,2,3,4,5,6,7,8,9,10,11,12)};