Hi,

can you provide a dummy/exemplary document for the optimization? As
similar to your usual imput as possible.

The size of the document, the coverage and amount of annotations are
some important key figures for the optimization.


Best,


Peter


Am 20.07.2017 um 12:27 schrieb Gaurav Dudeja:
> This is per reference of this question I raised on StackOverflow As per 
> @Peter Kluegl there is too much scope for code improvement.
> So eagerly looking how can I improve this script
> https://stackoverflow.com/questions/44351051/uima-ruta-out-of-memory-issue-in-spark-context
>
> =========================================================
> TYPESYSTEM EDMTypeSystem;
>
> WORDLIST EnglishStopWordList = 'en/anchor/en_stopWords.txt';
> WORDLIST FiltersList = 'en/anchor/AnchorFilters.txt';
> DECLARE Filters, EnglishStopWords;
> DECLARE Anchors, SpanStart,SpanClose;
>
> DocumentAnnotation{-> ADDRETAINTYPE(MARKUP)};
>
> DocumentAnnotation{-> MARKFAST(Filters, FiltersList)};
>
> STRING MixCharacterRegex = "[0-9]+[a-zA-Z]+";
>
> DocumentAnnotation{-> MARKFAST(EnglishStopWords, EnglishStopWordList,true)};
> (SW | CW | CAP ) { -> MARK(Anchors, 1, 2)};
> Anchors{CONTAINS(EnglishStopWords) -> UNMARK(Anchors)};
>
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) 
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 
> 1, 4)};
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? (SW | CW | CAP ) 
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 
> 1, 4)};
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) 
> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? EnglishStopWords? { -> 
> MARK(Anchors, 1, 4)};
> (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { 
> -> MARK(Anchors, 1, 3)};
>
> Anchors{CONTAINS(MARKUP) -> UNMARK(Anchors)};
> MixCharacterRegex -> Anchors;
>
> "<Value>"  -> SpanStart;
> "</Value>" -> SpanClose;
>
> Anchors{-> CREATE(ExtractedData, "type" = "ANCHOR", "value" = Anchors)};
>
> SpanStart Filters? SPACE? ExtractedData SPACE? Filters? SpanClose{-> 
> GATHER(Data, 2, 6, "ExtractedData" = 4)};
> =========================================================

Reply via email to