Can you help me optimize the attached ruta script?
blocked with the script causing issue

On Wed, Apr 23, 2025 at 6:05 PM Raghunath Mahakud <
mahakudraghunath...@gmail.com> wrote:

> Hi Peter,
>
> We are getting OOM issues for one of the ruta scripts and text.
> On part of this we tried enabling the flag  simpleGreedyForComposed and
> tried out and saw our existing test cases failing.
>
> 1. One of the batch of texts test cas getting stuck and no movement.
> 2. one of the text(which has 13 email ids) with attached email ruta is
> getting below error:
>
> java.lang.NullPointerException: Cannot invoke "java.util.List.size()"
> because "match" is null
> at
> org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:387)
> ~[ruta-core-3.3.0.jar:?]
> at
> org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:479)
> ~[ruta-core-3.3.0.jar:?]
> at
> org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:112)
> ~[ruta-core-3.3.0.jar:?]
> at
> org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79)
> ~[ruta-core-3.3.0.jar:?]
> at
> org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:79)
> ~[ruta-core-3.3.0.jar:?]
> at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:72)
> ~[ruta-core-3.3.0.jar:?]
> at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:63)
> ~[ruta-core-3.3.0.jar:?]
> at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:42)
> ~[ruta-core-3.3.0.jar:?]
> at
> org.apache.uima.ruta.block.RutaScriptBlock.apply(RutaScriptBlock.java:74)
> ~[ruta-core-3.3.0.jar:?]
>
> Email ruta is:
>
> DocumentAnnotation{-> RETAINTYPE(SPACE)};
>
>  ((W|NUM) (W|NUM)[0,1])+ "@" W+? PERIOD+?
> W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,5)};
>
>  ((W|NUM) (W|NUM)[0,1])+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+?
> W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,7)};
>
>  ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W+? PERIOD+?
> W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,6)};
>
>  ((W|NUM)+ ("."|"_") )+ (W|NUM)+ "@" W[0,1]? PERIOD[0,1]? W+? PERIOD+?
> W{REGEXP("(?i)([a-zA-Z]{3}|[a-zA-Z]{2})") -> MARK(EntityType,1,8)}
>
> Looks like the flag is still buggy in the latest version of uima ruta?
>
> Or something wrong i am doing after enabling the flag looks many errors.
>
> Regards,
>
> Raghunath...
>
>
>
>
>
DECLARE EntityType (STRING entityType);
DocumentAnnotation{-> RETAINTYPE(SPACE,BREAK)};

((NUM (SPECIAL NUM)?)|EntityType{FEATURE("entityType", "amount")})
(COMMA|SPACE|BREAK)*
((W|NUM) (SPACE | PERIOD)?)*
(COMMA|SPACE|BREAK)*
(((W|NUM) (SPACE | PERIOD)?)*(COMMA|SPACE|BREAK)*)
((EntityType+{FEATURE("entityType", "location_indicator")} | 
(NUM{REGEXP(".....")}
    ("-" NUM{REGEXP("....")})?))
(COMMA|SPACE|BREAK)*)+
{-> MARK(EntityType,1,8)};




NUM+
// 123-1
(SPECIAL NUM+)?
SPACE*
// Street lane road
((W|NUM) (SPACE | PERIOD)?)*
(COMMA|SPACE)*
// City
(W SPACE?)+
(COMMA|SPACE)*
// state
(W SPACE?)+
(COMMA|SPACE)*
// pincode
NUM
(COMMA|SPACE)*
W?{REGEXP("(?i)(USA|US|CANADA)") ->MARK(EntityType,1,2,3,4,5,6,7,8,9,10,11,12)};

Reply via email to