Hi,
if you want me to improve the rules, you have to provide some representative text. If I make up some text and optimize the rules, I'll report a speedup of X. Then you test the optimized rule and in case the results of rules are correct (no realistic text to test it on), you measure a speed up of Y. Then we start again where I ask for some representative text. Best, Peter Am 20.07.2017 um 14:52 schrieb Peter Klügl: > Hi, > > > can you provide a dummy/exemplary document for the optimization? As > similar to your usual imput as possible. > > The size of the document, the coverage and amount of annotations are > some important key figures for the optimization. > > > Best, > > > Peter > > > Am 20.07.2017 um 12:27 schrieb Gaurav Dudeja: >> This is per reference of this question I raised on StackOverflow As per >> @Peter Kluegl there is too much scope for code improvement. >> So eagerly looking how can I improve this script >> https://stackoverflow.com/questions/44351051/uima-ruta-out-of-memory-issue-in-spark-context >> >> ========================================================= >> TYPESYSTEM EDMTypeSystem; >> >> WORDLIST EnglishStopWordList = 'en/anchor/en_stopWords.txt'; >> WORDLIST FiltersList = 'en/anchor/AnchorFilters.txt'; >> DECLARE Filters, EnglishStopWords; >> DECLARE Anchors, SpanStart,SpanClose; >> >> DocumentAnnotation{-> ADDRETAINTYPE(MARKUP)}; >> >> DocumentAnnotation{-> MARKFAST(Filters, FiltersList)}; >> >> STRING MixCharacterRegex = "[0-9]+[a-zA-Z]+"; >> >> DocumentAnnotation{-> MARKFAST(EnglishStopWords, EnglishStopWordList,true)}; >> (SW | CW | CAP ) { -> MARK(Anchors, 1, 2)}; >> Anchors{CONTAINS(EnglishStopWords) -> UNMARK(Anchors)}; >> >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> >> MARK(Anchors, 1, 4)}; >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? (SW | CW | CAP ) >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> >> MARK(Anchors, 1, 4)}; >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) >> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? EnglishStopWords? { -> >> MARK(Anchors, 1, 4)}; >> (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? >> { -> MARK(Anchors, 1, 3)}; >> >> Anchors{CONTAINS(MARKUP) -> UNMARK(Anchors)}; >> MixCharacterRegex -> Anchors; >> >> "<Value>" -> SpanStart; >> "</Value>" -> SpanClose; >> >> Anchors{-> CREATE(ExtractedData, "type" = "ANCHOR", "value" = Anchors)}; >> >> SpanStart Filters? SPACE? ExtractedData SPACE? Filters? SpanClose{-> >> GATHER(Data, 2, 6, "ExtractedData" = 4)}; >> =========================================================
