http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stoptags_ja.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stoptags_ja.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stoptags_ja.txt new file mode 100644 index 0000000..71b7508 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stoptags_ja.txt @@ -0,0 +1,420 @@ +# +# This file defines a Japanese stoptag set for JapanesePartOfSpeechStopFilter. +# +# Any token with a part-of-speech tag that exactly matches those defined in this +# file are removed from the token stream. +# +# Set your own stoptags by uncommenting the lines below. Note that comments are +# not allowed on the same line as a stoptag. See LUCENE-3745 for frequency lists, +# etc. that can be useful for building you own stoptag set. +# +# The entire possible tagset is provided below for convenience. +# +##### +# noun: unclassified nouns +#åè© +# +# noun-common: Common nouns or nouns where the sub-classification is undefined +#åè©-ä¸è¬ +# +# noun-proper: Proper nouns where the sub-classification is undefined +#åè©-åºæåè© +# +# noun-proper-misc: miscellaneous proper nouns +#åè©-åºæåè©-ä¸è¬ +# +# noun-proper-person: Personal names where the sub-classification is undefined +#åè©-åºæåè©-人å +# +# noun-proper-person-misc: names that cannot be divided into surname and +# given name; foreign names; names where the surname or given name is unknown. +# e.g. ãå¸ã®æ¹ +#åè©-åºæåè©-人å-ä¸è¬ +# +# noun-proper-person-surname: Mainly Japanese surnames. +# e.g. å±±ç° +#åè©-åºæåè©-人å-å§ +# +# noun-proper-person-given_name: Mainly Japanese given names. +# e.g. 太é +#åè©-åºæåè©-人å-å +# +# noun-proper-organization: Names representing organizations. +# e.g. éç£ç, NHK +#åè©-åºæåè©-çµç¹ +# +# noun-proper-place: Place names where the sub-classification is undefined +#åè©-åºæåè©-å°å +# +# noun-proper-place-misc: Place names excluding countries. +# e.g. ã¢ã¸ã¢, ãã«ã»ãã, äº¬é½ +#åè©-åºæåè©-å°å-ä¸è¬ +# +# noun-proper-place-country: Country names. +# e.g. æ¥æ¬, ãªã¼ã¹ãã©ãªã¢ +#åè©-åºæåè©-å°å-å½ +# +# noun-pronoun: Pronouns where the sub-classification is undefined +#åè©-代åè© +# +# noun-pronoun-misc: miscellaneous pronouns: +# e.g. ãã, ãã, ããã¤, ããªã, ãã¡ãã¡, ããã¤, ã©ãã, ãªã«, ã¿ãªãã, ã¿ããª, ãããã, ãããã +#åè©-代åè©-ä¸è¬ +# +# noun-pronoun-contraction: Spoken language contraction made by combining a +# pronoun and the particle 'wa'. +# e.g. ããã, ããã, ãããã, ããã, ãããã +#åè©-代åè©-ç¸®ç´ +# +# noun-adverbial: Temporal nouns such as names of days or months that behave +# like adverbs. Nouns that represent amount or ratios and can be used adverbially, +# e.g. éæ, 䏿, åå¾, å°é +#åè©-å¯è©å¯è½ +# +# noun-verbal: Nouns that take arguments with case and can appear followed by +# 'suru' and related verbs (ãã, ã§ãã, ãªãã, ãã ãã) +# e.g. ã¤ã³ããã, æç, æªå, æªæ¦è¦é, ä¸å®å¿, ä¸åã +#åè©-ãµå¤æ¥ç¶ +# +# noun-adjective-base: The base form of adjectives, words that appear before 㪠("na") +# e.g. å¥åº·, 宿, é§ç®, ã ã +#åè©-形容åè©èªå¹¹ +# +# noun-numeric: Arabic numbers, Chinese numerals, and counters like ä½ (å), æ°. +# e.g. 0, 1, 2, ä½, æ°, å¹¾ +#åè©-æ° +# +# noun-affix: noun affixes where the sub-classification is undefined +#åè©-éèªç« +# +# noun-affix-misc: Of adnominalizers, the case-marker ã® ("no"), and words that +# attach to the base form of inflectional words, words that cannot be classified +# into any of the other categories below. This category includes indefinite nouns. +# e.g. ããã¤ã, æ, ãã, ç²æ, æ°, ããã, å«ã, ãã, ç, ãã¨, äº, ãã¨, æ¯, ãã ã, 次第, +# é , ãã, æçº, ã¤ãã§, åºã§, ã¤ãã, ç©ãã, ç¹, ã©ãã, ã®, ã¯ã, ç, ã¯ãã¿, å¼¾ã¿, +# æå, ãµã, ãµã, æ¯ã, ã»ã, æ¹, æ¨, ãã®, ç©, è , ãã, æ , ããã, æä»¥, ãã, 訳, +# ãã, å²ã, å², ã-å£èª/, ãã-å£èª/ +#åè©-éèªç«-ä¸è¬ +# +# noun-affix-adverbial: noun affixes that that can behave as adverbs. +# e.g. ããã , é, ããã, æãå¥, ãã¨, å¾, ä½ã, 以å¤, 以é, 以å¾, 以ä¸, 以å, 䏿¹, ãã, +# ä¸, ãã¡, å , ãã, æã, ããã, éã, ãã, ã£ãã, çµæ, ãã, é , ãã, é, æä¸, ããªã, +# æä¸, ããã, èªä½, ãã³, 度, ãã, çº, ã¤ã©, é½åº¦, ã¨ãã, éã, ã¨ã, æ, ã¨ãã, æ, +# ã¨ãã, é端, ãªã, ä¸, ã®ã¡, å¾, ã°ãã, å ´å, æ¥, ã¶ã, å, ã»ã, ä», ã¾ã, å, ã¾ã¾, +# å, ä¾, ã¿ãã, ç¢å +#åè©-éèªç«-å¯è©å¯è½ +# +# noun-affix-aux: noun affixes treated as å©åè© ("auxiliary verb") in school grammars +# with the stem ãã(ã ) ("you(da)"). +# e.g. ãã, ãã, æ§ (ãã) +#åè©-éèªç«-å©åè©èªå¹¹ +# +# noun-affix-adjective-base: noun affixes that can connect to the indeclinable +# connection form 㪠(aux "da"). +# e.g. ã¿ãã, ãµã +#åè©-éèªç«-形容åè©èªå¹¹ +# +# noun-special: special nouns where the sub-classification is undefined. +#åè©-ç¹æ® +# +# noun-special-aux: The ããã ("souda") stem form that is used for reporting news, is +# treated as å©åè© ("auxiliary verb") in school grammars, and attach to the base +# form of inflectional words. +# e.g. ãã +#åè©-ç¹æ®-å©åè©èªå¹¹ +# +# noun-suffix: noun suffixes where the sub-classification is undefined. +#åè©-æ¥å°¾ +# +# noun-suffix-misc: Of the nouns or stem forms of other parts of speech that connect +# to ã¬ã« or ã¿ã¤ and can combine into compound nouns, words that cannot be classified into +# any of the other categories below. In general, this category is more inclusive than +# æ¥å°¾èª ("suffix") and is usually the last element in a compound noun. +# e.g. ãã, ãã, æ¹, ç²æ (ãã), ããã, ãã¿, æ°å³, ããã¿, (ï½ãã) ã, 次第, æ¸ (ã) ã¿, +# ãã, (ã§ã)ã£ã, æ, 観, æ§, å¦, é¡, é¢, ç¨ +#åè©-æ¥å°¾-ä¸è¬ +# +# noun-suffix-person: Suffixes that form nouns and attach to person names more often +# than other nouns. +# e.g. å, æ§, è +#åè©-æ¥å°¾-人å +# +# noun-suffix-place: Suffixes that form nouns and attach to place names more often +# than other nouns. +# e.g. çº, å¸, ç +#åè©-æ¥å°¾-å°å +# +# noun-suffix-verbal: Of the suffixes that attach to nouns and form nouns, those that +# can appear before ã¹ã« ("suru"). +# e.g. å, è¦, åã, å ¥ã, è½ã¡, è²·ã +#åè©-æ¥å°¾-ãµå¤æ¥ç¶ +# +# noun-suffix-aux: The stem form of ããã (æ§æ ) that is used to indicate conditions, +# is treated as å©åè© ("auxiliary verb") in school grammars, and attach to the +# conjunctive form of inflectional words. +# e.g. ãã +#åè©-æ¥å°¾-å©åè©èªå¹¹ +# +# noun-suffix-adjective-base: Suffixes that attach to other nouns or the conjunctive +# form of inflectional words and appear before the copula ã ("da"). +# e.g. ç, ã, ãã¡ +#åè©-æ¥å°¾-形容åè©èªå¹¹ +# +# noun-suffix-adverbial: Suffixes that attach to other nouns and can behave as adverbs. +# e.g. å¾ (ã), 以å¾, 以é, 以å, åå¾, ä¸, æ«, ä¸, æ (ã) +#åè©-æ¥å°¾-å¯è©å¯è½ +# +# noun-suffix-classifier: Suffixes that attach to numbers and form nouns. This category +# is more inclusive than 婿°è© ("classifier") and includes common nouns that attach +# to numbers. +# e.g. å, ã¤, æ¬, å, ãã¼ã»ã³ã, cm, kg, ã«æ, ãå½, åºç», æé, æå +#åè©-æ¥å°¾-婿°è© +# +# noun-suffix-special: Special suffixes that mainly attach to inflecting words. +# e.g. (楽ã) ã, (èã) æ¹ +#åè©-æ¥å°¾-ç¹æ® +# +# noun-suffix-conjunctive: Nouns that behave like conjunctions and join two words +# together. +# e.g. (æ¥æ¬) 対 (ã¢ã¡ãªã«), 対 (ã¢ã¡ãªã«), (3) 対 (5), (女åª) å ¼ (主婦) +#åè©-æ¥ç¶è©ç +# +# noun-verbal_aux: Nouns that attach to the conjunctive particle 㦠("te") and are +# semantically verb-like. +# e.g. ããã, ã覧, 御覧, é æ´ +#åè©-åè©éèªç«ç +# +# noun-quotation: text that cannot be segmented into words, proverbs, Chinese poetry, +# dialects, English, etc. Currently, the only entry for åè© å¼ç¨æåå ("noun quotation") +# is ããã ("iwaku"). +#åè©-å¼ç¨æåå +# +# noun-nai_adjective: Words that appear before the auxiliary verb ãªã ("nai") and +# behave like an adjective. +# e.g. ç³ã訳, 仿¹, ã¨ãã§ã, éã +#åè©-ãã¤å½¢å®¹è©èªå¹¹ +# +##### +# prefix: unclassified prefixes +#æ¥é è© +# +# prefix-nominal: Prefixes that attach to nouns (including adjective stem forms) +# excluding numerical expressions. +# e.g. ã (æ°´), æ (æ°), å (社), æ (ï½æ°), é« (å質), ã (è¦äº), ã (ç«æ´¾) +#æ¥é è©-åè©æ¥ç¶ +# +# prefix-verbal: Prefixes that attach to the imperative form of a verb or a verb +# in conjunctive form followed by ãªã/ãªãã/ãã ãã. +# e.g. ã (èªã¿ãªãã), ã (座ã) +#æ¥é è©-åè©æ¥ç¶ +# +# prefix-adjectival: Prefixes that attach to adjectives. +# e.g. ã (å¯ãã§ããã), ãã« (ã§ãã) +#æ¥é è©-å½¢å®¹è©æ¥ç¶ +# +# prefix-numerical: Prefixes that attach to numerical expressions. +# e.g. ç´, ããã, æ¯æ +#æ¥é è©-æ°æ¥ç¶ +# +##### +# verb: unclassified verbs +#åè© +# +# verb-main: +#åè©-èªç« +# +# verb-auxiliary: +#åè©-éèªç« +# +# verb-suffix: +#åè©-æ¥å°¾ +# +##### +# adjective: unclassified adjectives +#å½¢å®¹è© +# +# adjective-main: +#形容è©-èªç« +# +# adjective-auxiliary: +#形容è©-éèªç« +# +# adjective-suffix: +#形容è©-æ¥å°¾ +# +##### +# adverb: unclassified adverbs +#å¯è© +# +# adverb-misc: Words that can be segmented into one unit and where adnominal +# modification is not possible. +# e.g. ãããããã, å¤å +#å¯è©-ä¸è¬ +# +# adverb-particle_conjunction: Adverbs that can be followed by ã®, ã¯, ã«, +# ãª, ãã, ã , etc. +# e.g. ãããªã«, ãããªã«, ãããªã«, ãªã«ã, ãªãã§ã +#å¯è©-å©è©é¡æ¥ç¶ +# +##### +# adnominal: Words that only have noun-modifying forms. +# e.g. ãã®, ãã®, ãã®, ã©ã®, ãããã, ãªãããã®, ä½ããã®, ããããª, ãããã, ãããã, ãããã, +# ã©ããã, ãããª, ãããª, ãããª, ã©ããª, 大ããª, å°ããª, ããããª, ã»ãã®, ãããã, +# ã(, ã) ãã (ãã¨ãªãã)ã, å¾®ã ãã, å ã ãã, åãªã, ãããªã, æãããåã, 亡ã +#é£ä½è© +# +##### +# conjunction: Conjunctions that can occur independently. +# e.g. ã, ããã©ã, ããã¦, ããã, ããã©ããã +æ¥ç¶è© +# +##### +# particle: unclassified particles. +å©è© +# +# particle-case: case particles where the subclassification is undefined. +å©è©-æ ¼å©è© +# +# particle-case-misc: Case particles. +# e.g. ãã, ã, ã§, ã¨, ã«, ã¸, ãã, ã, ã®, ã«ã¦ +å©è©-æ ¼å©è©-ä¸è¬ +# +# particle-case-quote: the "to" that appears after nouns, a personâs speech, +# quotation marks, expressions of decisions from a meeting, reasons, judgements, +# conjectures, etc. +# e.g. ( ã ) 㨠(è¿°ã¹ã.), ( ã§ãã) 㨠(ãã¦å·è¡ç¶äº...) +å©è©-æ ¼å©è©-å¼ç¨ +# +# particle-case-compound: Compounds of particles and verbs that mainly behave +# like case particles. +# e.g. ã¨ãã, ã¨ãã£ã, ã¨ããã, ã¨ãã¦, ã¨ã¨ãã«, ã¨å ±ã«, ã§ãã£ã¦, ã«ããã£ã¦, ã«å½ãã£ã¦, ã«å½ã£ã¦, +# ã«ããã, ã«å½ãã, ã«å½ã, ã«å½ãã, ã«ããã, ã«ããã¦, ã«æ¼ãã¦,ã«æ¼ã¦, ã«ããã, ã«æ¼ãã, +# ã«ãã, ã«ããã¦, ã«ããã, ã«é¢ã, ã«ãããã¦, ã«é¢ãã¦, ã«ãããã, ã«é¢ãã, ã«éã, +# ã«éãã¦, ã«ãããã, ã«å¾ã, ã«å¾ã, ã«ãããã£ã¦, ã«å¾ã£ã¦, ã«ããã, ã«å¯¾ã, ã«ãããã¦, +# ã«å¯¾ãã¦, ã«ãããã, ã«å¯¾ãã, ã«ã¤ãã¦, ã«ã¤ã, ã«ã¤ã, ã«ã¤ãã¦, ã«ã¤ã, ã«ã¤ãã¦, ã«ã¨ã£ã¦, +# ã«ã¨ã, ã«ã¾ã¤ãã, ã«ãã£ã¦, ã«ä¾ã£ã¦, ã«å ã£ã¦, ã«ãã, ã«ä¾ã, ã«å ã, ã«ãã, ã«ä¾ã, ã«å ã, +# ã«ããã£ã¦, ã«ããã, ããã£ã¦, ã以ã£ã¦, ãéã, ãéãã¦, ãéãã¦, ãããã£ã¦, ãããã, ãããã, +# ã£ã¦-å£èª/, ã¡ã ã-é¢è¥¿å¼ãã¨ããã/, (ä½) ã¦ãã (人)-å£èª/, ã£ã¦ãã-å£èª/, ã¨ããµ, ã¨ãããµ +å©è©-æ ¼å©è©-é£èª +# +# particle-conjunctive: +# e.g. ãã, ããã«ã¯, ã, ããã©, ããã©ã, ãã©, ã, ã¤ã¤, ã¦, ã§, ã¨, ã¨ããã, ã©ããã, ã¨ã, ã©ã, +# ãªãã, ãªã, ã®ã§, ã®ã«, ã°, ãã®ã®, ã ( ãã), ãããªã, (ããã) ãã(ãããªã)-å£èª/, +# (è¡ã£) ã¡ã(ãããªã)-å£èª/, (è¨ã£) ãã£ã¦ (ãããããªã)-å£èª/, (ããããªã)ã£ãã£ã¦ (å¹³æ°)-å£èª/ +å©è©-æ¥ç¶å©è© +# +# particle-dependency: +# e.g. ãã, ãã, ãã, ãã, ã¯, ã, ã +å©è©-ä¿å©è© +# +# particle-adverbial: +# e.g. ãã¦ã, ãã, ããã, ä½, ããã, ãã, (妿 ¡) ãã(ãããæµè¡ã£ã¦ãã)-å£èª/, +# (ãã)ããã (ãããªã)-å£èª/, ãã¤, (ç§) ãªã, ãªã©, (ç§) ãªã (ã«), (å ç) ãªãã (大å«ã)-å£èª/, +# (ç§) ãªãã, (å ç) ãªã㦠(大å«ã)-å£èª/, ã®ã¿, ã ã, (ç§) ã ã£ã¦-å£èª/, ã ã«, +# (å½¼)ã£ãã-å£èª/, (ãè¶) ã§ã (ããã), ç (ã¨ã), (ä»å¾) ã¨ã, ã°ãã, ã°ã£ã-å£èª/, ã°ã£ãã-å£èª/, +# ã»ã©, ç¨, ã¾ã§, è¿, (誰) ã (ã)([å©è©-æ ¼å©è©] ããã³ [å©è©-ä¿å©è©] ã®åã«ä½ç½®ããããã) +å©è©-å¯å©è© +# +# particle-interjective: particles with interjective grammatical roles. +# e.g. (æ¾å³¶) ã +å©è©-éæå©è© +# +# particle-coordinate: +# e.g. ã¨, ãã, ã ã®, ã ã, ã¨ã, ãªã, ã, ãã +å©è©-並ç«å©è© +# +# particle-final: +# e.g. ãã, ããã, ã, ã, (ã )ã£ã-å£èª/, (ã¨ã¾ã£ã¦ã) ã§-æ¹è¨/, ãª, ã, ãªã-å£èª/, ã, ã, ã, +# ãã-å£èª/, ãã-å£èª/, ãã-æ¹è¨/, ã®, ã®ã-å£èª/, ã, ã, ã¨, ãã-å£èª/, ã, ãã-å£èª/ +å©è©-çµå©è© +# +# particle-adverbial/conjunctive/final: The particle "ka" when unknown whether it is +# adverbial, conjunctive, or sentence final. For example: +# (a) ãA ã B ãã. Ex:ã(å½å ã§éç¨ãã) ã,(æµ·å¤ã§éç¨ãã) ã (.)ã +# (b) Inside an adverb phrase. Ex:ã(幸ãã¨ãã) ã (, æ»è ã¯ããªãã£ã.)ã +# ã(ç¥ããå±ãããã) ã (, 試é¨ã«åæ ¼ãã.)ã +# (c) ããã®ããã«ã. Ex:ã(ä½ããªãã£ã) ã (ã®ããã«æ¯ãèã£ã.)ã +# e.g. ã +å©è©-å¯å©è©ï¼ä¸¦ç«å©è©ï¼çµå©è© +# +# particle-adnominalizer: The "no" that attaches to nouns and modifies +# non-inflectional words. +å©è©-é£ä½å +# +# particle-adnominalizer: The "ni" and "to" that appear following nouns and adverbs +# that are giongo, giseigo, or gitaigo. +# e.g. ã«, 㨠+å©è©-å¯è©å +# +# particle-special: A particle that does not fit into one of the above classifications. +# This includes particles that are used in Tanka, Haiku, and other poetry. +# e.g. ããª, ãã, ( ããã ãã) ã«, (ããã) ã«ã(ãããã), (俺) ã (å®¶) +å©è©-ç¹æ® +# +##### +# auxiliary-verb: +å©åè© +# +##### +# interjection: Greetings and other exclamations. +# e.g. ãã¯ãã, ãã¯ãããããã¾ã, ããã«ã¡ã¯, ããã°ãã¯, ãããã¨ã, ã©ãããããã¨ã, ãããã¨ããããã¾ã, +# ããã ãã¾ã, ãã¡ãããã¾, ãããªã, ããããªã, ã¯ã, ããã, ããã, ããããªãã +#æåè© +# +##### +# symbol: unclassified Symbols. +è¨å· +# +# symbol-misc: A general symbol not in one of the categories below. +# e.g. [ââ@$ãâ+] +è¨å·-ä¸è¬ +# +# symbol-comma: Commas +# e.g. [,ã] +è¨å·-èªç¹ +# +# symbol-period: Periods and full stops. +# e.g. [.ï¼ã] +è¨å·-å¥ç¹ +# +# symbol-space: Full-width whitespace. +è¨å·-ç©ºç½ +# +# symbol-open_bracket: +# e.g. [({ââãã] +è¨å·-æ¬å¼§é +# +# symbol-close_bracket: +# e.g. [)}ââããã] +è¨å·-æ¬å¼§é +# +# symbol-alphabetic: +#è¨å·-ã¢ã«ãã¡ããã +# +##### +# other: unclassified other +#ãã®ä» +# +# other-interjection: Words that are hard to classify as noun-suffixes or +# sentence-final particles. +# e.g. (ã )ã¡ +ãã®ä»-éæ +# +##### +# filler: Aizuchi that occurs during a conversation or sounds inserted as filler. +# e.g. ãã®, ããã¨, ã㨠+ãã£ã©ã¼ +# +##### +# non-verbal: non-verbal sound. +éè¨èªé³ +# +##### +# fragment: +#èªæç +# +##### +# unknown: unknown part of speech. +#æªç¥èª +# +##### End of file
http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ar.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ar.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ar.txt new file mode 100644 index 0000000..046829d --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ar.txt @@ -0,0 +1,125 @@ +# This file was created by Jacques Savoy and is distributed under the BSD license. +# See http://members.unine.ch/jacques.savoy/clef/index.html. +# Also see http://www.opensource.org/licenses/bsd-license.html +# Cleaned on October 11, 2009 (not normalized, so use before normalization) +# This means that when modifying this list, you might need to add some +# redundant entries, for example containing forms with both Ø£ and ا +Ù Ù +ÙÙ Ù +Ù ÙÙØ§ +Ù ÙÙ +ÙÙ +ÙÙÙ +ÙÙÙØ§ +ÙÙÙ +Ù +Ù +ث٠+ا٠+Ø£Ù +ب +Ø¨ÙØ§ +ب٠+ا +Ø£ +ا٠+ا٠+Ø£Ù +Ø£Ù +ÙØ§ +ÙÙØ§ +Ø§ÙØ§ +Ø£ÙØ§ +Ø¥ÙØ§ +ÙÙÙ +٠ا +Ù٠ا +Ù٠ا +Ù٠ا +ع٠+٠ع +اذا +إذا +ا٠+Ø£Ù +Ø¥Ù +اÙÙØ§ +Ø£ÙÙØ§ +Ø¥ÙÙØ§ +اÙÙ +Ø£ÙÙ +Ø¥ÙÙ +با٠+بأ٠+ÙØ§Ù +ÙØ£Ù +ÙØ§Ù +ÙØ£Ù +ÙØ¥Ù +Ø§ÙØªÙ +Ø§ÙØªÙ +Ø§ÙØ°Ù +Ø§ÙØ°Ù +Ø§ÙØ°ÙÙ +اÙÙ +اÙÙ +Ø¥ÙÙ +Ø¥ÙÙ +عÙÙ +عÙÙÙØ§ +عÙÙÙ +ا٠ا +أ٠ا +إ٠ا +Ø§ÙØ¶Ø§ +Ø£ÙØ¶Ø§ +ÙÙ +ÙÙÙ +ÙÙ +ÙÙÙ +ÙÙ +ÙÙÙ +ÙÙ +ÙÙ +ÙÙ +ÙÙÙ +ÙÙÙ +ÙÙÙ +ÙÙÙ +ÙÙÙ +ÙÙÙ +Ø§ÙØª +Ø£ÙØª +ÙÙ +ÙÙØ§ +ÙÙ +ÙØ°Ù +ÙØ°Ø§ +تÙÙ +ذÙÙ +ÙÙØ§Ù +ÙØ§Ùت +ÙØ§Ù +ÙÙÙÙ +تÙÙÙ +ÙÙØ§Ùت +ÙÙØ§Ù +ØºÙØ± +بعض +ÙØ¯ +ÙØÙ +بÙÙ +بÙÙ٠ا +Ù ÙØ° +ض٠٠+ØÙØ« +Ø§ÙØ§Ù +Ø§ÙØ¢Ù +Ø®ÙØ§Ù +بعد +ÙØ¨Ù +ØØªÙ +Ø¹ÙØ¯ +Ø¹ÙØ¯Ù ا +ÙØ¯Ù +Ø¬Ù ÙØ¹ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_bg.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_bg.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_bg.txt new file mode 100644 index 0000000..1ae4ba2 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_bg.txt @@ -0,0 +1,193 @@ +# This file was created by Jacques Savoy and is distributed under the BSD license. +# See http://members.unine.ch/jacques.savoy/clef/index.html. +# Also see http://www.opensource.org/licenses/bsd-license.html +а +аз +ако +ала +бе +без +беÑе +би +бил +била +били +било +близо +бÑÐ´Ð°Ñ +бÑде +бÑÑ Ð° +в +Ð²Ð°Ñ +Ð²Ð°Ñ +ваÑа +веÑоÑÑно +веÑе +взема +ви +вие +винаги +вÑе +вÑеки +вÑиÑки +вÑиÑко +вÑÑка +вÑв +вÑпÑеки +вÑÑÑ Ñ +г +ги +главно +го +д +да +дали +до +докаÑо +докога +доÑи +доÑега +доÑÑа +е +едва +един +еÑо +за +зад +заедно +заÑади +заÑега +заÑова +заÑо +заÑоÑо +и +из +или +им +има +Ð¸Ð¼Ð°Ñ +иÑка +й +каза +как +каква +какво +какÑо +какÑв +каÑо +кога +когаÑо +коеÑо +коиÑо +кой +койÑо +колко +коÑÑо +кÑде +кÑдеÑо +кÑм +ли +м +ме +Ð¼ÐµÐ¶Ð´Ñ +мен +ми +мнозина +мога +Ð¼Ð¾Ð³Ð°Ñ +може +Ð¼Ð¾Ð»Ñ +моменÑа +Ð¼Ñ +н +на +над +назад +най +напÑави +напÑед +напÑÐ¸Ð¼ÐµÑ +Ð½Ð°Ñ +не +него +Ð½ÐµÑ +ни +ние +никой +ниÑо +но +нÑкои +нÑкой +нÑма +обаÑе +около +оÑвен +оÑобено +Ð¾Ñ +оÑгоÑе +оÑново +оÑе +пак +по +повеÑе +повеÑеÑо +под +поне +поÑади +поÑле +поÑÑи +пÑави +пÑед +пÑеди +пÑез +пÑи +пÑк +пÑÑво +Ñ +Ñа +Ñамо +Ñе +Ñега +Ñи +ÑкоÑо +Ñлед +Ñме +ÑпоÑед +ÑÑед +ÑÑеÑÑ +ÑÑе +ÑÑм +ÑÑÑ +ÑÑÑо +Ñ +Ñази +Ñака +Ñакива +ÑакÑв +Ñам +Ñвой +Ñе +Ñези +Ñи +Ñн +Ñо +Ñова +Ñогава +Ñози +Ñой +Ñолкова +ÑоÑно +ÑÑÑбва +ÑÑк +ÑÑй +ÑÑ +ÑÑÑ +Ñ +Ñ Ð°ÑеÑва +Ñ +Ñе +ÑеÑÑо +ÑÑез +Ñе +Ñом +Ñ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ca.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ca.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ca.txt new file mode 100644 index 0000000..3da65de --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ca.txt @@ -0,0 +1,220 @@ +# Catalan stopwords from http://github.com/vcl/cue.language (Apache 2 Licensed) +a +abans +acà +ah +aixà +això +al +als +aleshores +algun +alguna +algunes +alguns +alhora +allà +allà +allò +altra +altre +altres +amb +ambdós +ambdues +apa +aquell +aquella +aquelles +aquells +aquest +aquesta +aquestes +aquests +aquà +baix +cada +cadascú +cadascuna +cadascunes +cadascuns +com +contra +d'un +d'una +d'unes +d'uns +dalt +de +del +dels +des +després +dins +dintre +donat +doncs +durant +e +eh +el +els +em +en +encara +ens +entre +érem +eren +éreu +es +és +esta +està +està vem +estaven +està veu +esteu +et +etc +ets +fins +fora +gairebé +ha +han +has +havia +he +hem +heu +hi +ho +i +igual +iguals +ja +l'hi +la +les +li +li'n +llavors +m'he +ma +mal +malgrat +mateix +mateixa +mateixes +mateixos +me +mentre +més +meu +meus +meva +meves +molt +molta +moltes +molts +mon +mons +n'he +n'hi +ne +ni +no +nogensmenys +només +nosaltres +nostra +nostre +nostres +o +oh +oi +on +pas +pel +pels +per +però +perquè +poc +poca +pocs +poques +potser +propi +qual +quals +quan +quant +que +què +quelcom +qui +quin +quina +quines +quins +s'ha +s'han +sa +semblant +semblants +ses +seu +seus +seva +seva +seves +si +sobre +sobretot +sóc +solament +sols +son +són +sons +sota +sou +t'ha +t'han +t'he +ta +tal +també +tampoc +tan +tant +tanta +tantes +teu +teus +teva +teves +ton +tons +tot +tota +totes +tots +un +una +unes +uns +us +va +vaig +vam +van +vas +veu +vosaltres +vostra +vostre +vostres http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_cz.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_cz.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_cz.txt new file mode 100644 index 0000000..53c6097 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_cz.txt @@ -0,0 +1,172 @@ +a +s +k +o +i +u +v +z +dnes +cz +tÃmto +budeÅ¡ +budem +byli +jseÅ¡ +můj +svým +ta +tomto +tohle +tuto +tyto +jej +zda +proÄ +máte +tato +kam +tohoto +kdo +kteÅà +mi +nám +tom +tomuto +mÃt +nic +proto +kterou +byla +toho +protože +asi +ho +naÅ¡i +napiÅ¡te +re +což +tÃm +takže +svých +jejà +svými +jste +aj +tu +tedy +teto +bylo +kde +ke +pravé +ji +nad +nejsou +Äi +pod +téma +mezi +pÅes +ty +pak +vám +ani +když +vÅ¡ak +neg +jsem +tento +Älánku +Älánky +aby +jsme +pÅed +pta +jejich +byl +jeÅ¡tÄ +až +bez +také +pouze +prvnà +vaÅ¡e +která +nás +nový +tipy +pokud +může +strana +jeho +své +jiné +zprávy +nové +nenà +vás +jen +podle +zde +už +být +vÃce +bude +již +než +který +by +které +co +nebo +ten +tak +má +pÅi +od +po +jsou +jak +dalšà +ale +si +se +ve +to +jako +za +zpÄt +ze +do +pro +je +na +atd +atp +jakmile +pÅiÄemž +já +on +ona +ono +oni +ony +my +vy +jà +ji +mÄ +mne +jemu +tomu +tÄm +tÄmu +nÄmu +nÄmuž +jehož +jÞ +jelikož +jež +jakož +naÄež http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_da.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_da.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_da.txt new file mode 100644 index 0000000..42e6145 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_da.txt @@ -0,0 +1,110 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A Danish stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + + | This is a ranked list (commonest to rarest) of stopwords derived from + | a large text sample. + + +og | and +i | in +jeg | I +det | that (dem. pronoun)/it (pers. pronoun) +at | that (in front of a sentence)/to (with infinitive) +en | a/an +den | it (pers. pronoun)/that (dem. pronoun) +til | to/at/for/until/against/by/of/into, more +er | present tense of "to be" +som | who, as +pÃ¥ | on/upon/in/on/at/to/after/of/with/for, on +de | they +med | with/by/in, along +han | he +af | of/by/from/off/for/in/with/on, off +for | at/for/to/from/by/of/ago, in front/before, because +ikke | not +der | who/which, there/those +var | past tense of "to be" +mig | me/myself +sig | oneself/himself/herself/itself/themselves +men | but +et | a/an/one, one (number), someone/somebody/one +har | present tense of "to have" +om | round/about/for/in/a, about/around/down, if +vi | we +min | my +havde | past tense of "to have" +ham | him +hun | she +nu | now +over | over/above/across/by/beyond/past/on/about, over/past +da | then, when/as/since +fra | from/off/since, off, since +du | you +ud | out +sin | his/her/its/one's +dem | them +os | us/ourselves +op | up +man | you/one +hans | his +hvor | where +eller | or +hvad | what +skal | must/shall etc. +selv | myself/youself/herself/ourselves etc., even +her | here +alle | all/everyone/everybody etc. +vil | will (verb) +blev | past tense of "to stay/to remain/to get/to become" +kunne | could +ind | in +nÃ¥r | when +være | present tense of "to be" +dog | however/yet/after all +noget | something +ville | would +jo | you know/you see (adv), yes +deres | their/theirs +efter | after/behind/according to/for/by/from, later/afterwards +ned | down +skulle | should +denne | this +end | than +dette | this +mit | my/mine +ogsÃ¥ | also +under | under/beneath/below/during, below/underneath +have | have +dig | you +anden | other +hende | her +mine | my +alt | everything +meget | much/very, plenty of +sit | his, her, its, one's +sine | his, her, its, one's +vor | our +mod | against +disse | these +hvis | if +din | your/yours +nogle | some +hos | by/at +blive | be/become +mange | many +ad | by/through +bliver | present tense of "to be/to become" +hendes | her/hers +været | be +thi | for (conj) +jer | you +sÃ¥dan | such, like this/like that http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_de.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_de.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_de.txt new file mode 100644 index 0000000..86525e7 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_de.txt @@ -0,0 +1,294 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A German stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + + | The number of forms in this list is reduced significantly by passing it + | through the German stemmer. + + +aber | but + +alle | all +allem +allen +aller +alles + +als | than, as +also | so +am | an + dem +an | at + +ander | other +andere +anderem +anderen +anderer +anderes +anderm +andern +anderr +anders + +auch | also +auf | on +aus | out of +bei | by +bin | am +bis | until +bist | art +da | there +damit | with it +dann | then + +der | the +den +des +dem +die +das + +daà | that + +derselbe | the same +derselben +denselben +desselben +demselben +dieselbe +dieselben +dasselbe + +dazu | to that + +dein | thy +deine +deinem +deinen +deiner +deines + +denn | because + +derer | of those +dessen | of him + +dich | thee +dir | to thee +du | thou + +dies | this +diese +diesem +diesen +dieser +dieses + + +doch | (several meanings) +dort | (over) there + + +durch | through + +ein | a +eine +einem +einen +einer +eines + +einig | some +einige +einigem +einigen +einiger +einiges + +einmal | once + +er | he +ihn | him +ihm | to him + +es | it +etwas | something + +euer | your +eure +eurem +euren +eurer +eures + +für | for +gegen | towards +gewesen | p.p. of sein +hab | have +habe | have +haben | have +hat | has +hatte | had +hatten | had +hier | here +hin | there +hinter | behind + +ich | I +mich | me +mir | to me + + +ihr | you, to her +ihre +ihrem +ihren +ihrer +ihres +euch | to you + +im | in + dem +in | in +indem | while +ins | in + das +ist | is + +jede | each, every +jedem +jeden +jeder +jedes + +jene | that +jenem +jenen +jener +jenes + +jetzt | now +kann | can + +kein | no +keine +keinem +keinen +keiner +keines + +können | can +könnte | could +machen | do +man | one + +manche | some, many a +manchem +manchen +mancher +manches + +mein | my +meine +meinem +meinen +meiner +meines + +mit | with +muss | must +musste | had to +nach | to(wards) +nicht | not +nichts | nothing +noch | still, yet +nun | now +nur | only +ob | whether +oder | or +ohne | without +sehr | very + +sein | his +seine +seinem +seinen +seiner +seines + +selbst | self +sich | herself + +sie | they, she +ihnen | to them + +sind | are +so | so + +solche | such +solchem +solchen +solcher +solches + +soll | shall +sollte | should +sondern | but +sonst | else +über | over +um | about, around +und | and + +uns | us +unse +unsem +unsen +unser +unses + +unter | under +viel | much +vom | von + dem +von | from +vor | before +während | while +war | was +waren | were +warst | wast +was | what +weg | away, off +weil | because +weiter | further + +welche | which +welchem +welchen +welcher +welches + +wenn | when +werde | will +werden | will +wie | how +wieder | again +will | want +wir | we +wird | will +wirst | willst +wo | where +wollen | want +wollte | wanted +würde | would +würden | would +zu | to +zum | zu + dem +zur | zu + der +zwar | indeed +zwischen | between + http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_el.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_el.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_el.txt new file mode 100644 index 0000000..232681f --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_el.txt @@ -0,0 +1,78 @@ +# Lucene Greek Stopwords list +# Note: by default this file is used after GreekLowerCaseFilter, +# so when modifying this file use 'Ï' instead of 'Ï' +ο +η +Ïο +οι +Ïα +ÏÎ¿Ï +ÏÎ·Ï +ÏÏν +Ïον +Ïην +και +κι +κ +ειμαι +ειÏαι +ειναι +ειμαÏÏε +ειÏÏε +ÏÏο +ÏÏον +ÏÏη +ÏÏην +μα +αλλα +αÏο +για +ÏÏÎ¿Ï +με +Ïε +ÏÏ +ÏαÏα +ανÏι +καÏα +μεÏα +θα +να +δε +δεν +μη +μην +εÏι +ÎµÎ½Ï +εαν +αν +ÏοÏε +ÏÎ¿Ï +ÏÏÏ +ÏÎ¿Î¹Î¿Ï +Ïοια +Ïοιο +Ïοιοι +ÏÎ¿Î¹ÎµÏ +ÏοιÏν +ÏÎ¿Î¹Î¿Ï Ï +Î±Ï ÏÎ¿Ï +Î±Ï Ïη +Î±Ï Ïο +Î±Ï Ïοι +Î±Ï ÏÏν +Î±Ï ÏÎ¿Ï Ï +Î±Ï ÏÎµÏ +Î±Ï Ïα +ÎµÎºÎµÎ¹Î½Î¿Ï +εκεινη +εκεινο +εκεινοι +ÎµÎºÎµÎ¹Î½ÎµÏ +εκεινα +εκεινÏν +ÎµÎºÎµÎ¹Î½Î¿Ï Ï +οÏÏÏ +ομÏÏ +ιÏÏÏ +οÏο +οÏι http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_en.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_en.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_en.txt new file mode 100644 index 0000000..2c164c0 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_en.txt @@ -0,0 +1,54 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# a couple of test stopwords to test that the words are really being +# configured from this file: +stopworda +stopwordb + +# Standard english stop words taken from Lucene's StopAnalyzer +a +an +and +are +as +at +be +but +by +for +if +in +into +is +it +no +not +of +on +or +such +that +the +their +then +there +these +they +this +to +was +will +with http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_es.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_es.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_es.txt new file mode 100644 index 0000000..487d78c --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_es.txt @@ -0,0 +1,356 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A Spanish stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + + + | The following is a ranked list (commonest to rarest) of stopwords + | deriving from a large sample of text. + + | Extra words have been added at the end. + +de | from, of +la | the, her +que | who, that +el | the +en | in +y | and +a | to +los | the, them +del | de + el +se | himself, from him etc +las | the, them +por | for, by, etc +un | a +para | for +con | with +no | no +una | a +su | his, her +al | a + el + | es from SER +lo | him +como | how +más | more +pero | pero +sus | su plural +le | to him, her +ya | already +o | or + | fue from SER +este | this + | ha from HABER +sà | himself etc +porque | because +esta | this + | son from SER +entre | between + | está from ESTAR +cuando | when +muy | very +sin | without +sobre | on + | ser from SER + | tiene from TENER +también | also +me | me +hasta | until +hay | there is/are +donde | where + | han from HABER +quien | whom, that + | están from ESTAR + | estado from ESTAR +desde | from +todo | all +nos | us +durante | during + | estados from ESTAR +todos | all +uno | a +les | to them +ni | nor +contra | against +otros | other + | fueron from SER +ese | that +eso | that + | habÃa from HABER +ante | before +ellos | they +e | and (variant of y) +esto | this +mà | me +antes | before +algunos | some +qué | what? +unos | a +yo | I +otro | other +otras | other +otra | other +él | he +tanto | so much, many +esa | that +estos | these +mucho | much, many +quienes | who +nada | nothing +muchos | many +cual | who + | sea from SER +poco | few +ella | she +estar | to be + | haber from HABER +estas | these + | estaba from ESTAR + | estamos from ESTAR +algunas | some +algo | something +nosotros | we + + | other forms + +mi | me +mis | mi plural +tú | thou +te | thee +ti | thee +tu | thy +tus | tu plural +ellas | they +nosotras | we +vosotros | you +vosotras | you +os | you +mÃo | mine +mÃa | +mÃos | +mÃas | +tuyo | thine +tuya | +tuyos | +tuyas | +suyo | his, hers, theirs +suya | +suyos | +suyas | +nuestro | ours +nuestra | +nuestros | +nuestras | +vuestro | yours +vuestra | +vuestros | +vuestras | +esos | those +esas | those + + | forms of estar, to be (not including the infinitive): +estoy +estás +está +estamos +estáis +están +esté +estés +estemos +estéis +estén +estaré +estarás +estará +estaremos +estaréis +estarán +estarÃa +estarÃas +estarÃamos +estarÃais +estarÃan +estaba +estabas +estábamos +estabais +estaban +estuve +estuviste +estuvo +estuvimos +estuvisteis +estuvieron +estuviera +estuvieras +estuviéramos +estuvierais +estuvieran +estuviese +estuvieses +estuviésemos +estuvieseis +estuviesen +estando +estado +estada +estados +estadas +estad + + | forms of haber, to have (not including the infinitive): +he +has +ha +hemos +habéis +han +haya +hayas +hayamos +hayáis +hayan +habré +habrás +habrá +habremos +habréis +habrán +habrÃa +habrÃas +habrÃamos +habrÃais +habrÃan +habÃa +habÃas +habÃamos +habÃais +habÃan +hube +hubiste +hubo +hubimos +hubisteis +hubieron +hubiera +hubieras +hubiéramos +hubierais +hubieran +hubiese +hubieses +hubiésemos +hubieseis +hubiesen +habiendo +habido +habida +habidos +habidas + + | forms of ser, to be (not including the infinitive): +soy +eres +es +somos +sois +son +sea +seas +seamos +seáis +sean +seré +serás +será +seremos +seréis +serán +serÃa +serÃas +serÃamos +serÃais +serÃan +era +eras +éramos +erais +eran +fui +fuiste +fue +fuimos +fuisteis +fueron +fuera +fueras +fuéramos +fuerais +fueran +fuese +fueses +fuésemos +fueseis +fuesen +siendo +sido + | sed also means 'thirst' + + | forms of tener, to have (not including the infinitive): +tengo +tienes +tiene +tenemos +tenéis +tienen +tenga +tengas +tengamos +tengáis +tengan +tendré +tendrás +tendrá +tendremos +tendréis +tendrán +tendrÃa +tendrÃas +tendrÃamos +tendrÃais +tendrÃan +tenÃa +tenÃas +tenÃamos +tenÃais +tenÃan +tuve +tuviste +tuvo +tuvimos +tuvisteis +tuvieron +tuviera +tuvieras +tuviéramos +tuvierais +tuvieran +tuviese +tuvieses +tuviésemos +tuvieseis +tuviesen +teniendo +tenido +tenida +tenidos +tenidas +tened + http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_eu.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_eu.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_eu.txt new file mode 100644 index 0000000..25f1db9 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_eu.txt @@ -0,0 +1,99 @@ +# example set of basque stopwords +al +anitz +arabera +asko +baina +bat +batean +batek +bati +batzuei +batzuek +batzuetan +batzuk +bera +beraiek +berau +berauek +bere +berori +beroriek +beste +bezala +da +dago +dira +ditu +du +dute +edo +egin +ere +eta +eurak +ez +gainera +gu +gutxi +guzti +haiei +haiek +haietan +hainbeste +hala +han +handik +hango +hara +hari +hark +hartan +hau +hauei +hauek +hauetan +hemen +hemendik +hemengo +hi +hona +honek +honela +honetan +honi +hor +hori +horiei +horiek +horietan +horko +horra +horrek +horrela +horretan +horri +hortik +hura +izan +ni +noiz +nola +non +nondik +nongo +nor +nora +ze +zein +zen +zenbait +zenbat +zer +zergatik +ziren +zituen +zu +zuek +zuen +zuten http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fa.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fa.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fa.txt new file mode 100644 index 0000000..723641c --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fa.txt @@ -0,0 +1,313 @@ +# This file was created by Jacques Savoy and is distributed under the BSD license. +# See http://members.unine.ch/jacques.savoy/clef/index.html. +# Also see http://www.opensource.org/licenses/bsd-license.html +# Note: by default this file is used after normalization, so when adding entries +# to this file, use the arabic 'Ù' instead of 'Û' +Ø§ÙØ§Ù +ÙØ¯Ø§Ø´ØªÙ +سراسر +Ø®ÙØ§Ù +Ø§ÙØ´Ø§Ù +ÙÙ +تاÙÙÙÙ +Ø¨ÙØ´ØªØ±Ù +دÙÙ +پس +ÙØ§Ø´Ù +ÙÚ¯Ù +ÙØ§ +Ø¯Ø§Ø´ØªÙØ¯ +سپس +ÙÙگا٠+ÙØ±Ú¯Ø² +Ù¾ÙØ¬ +ÙØ´Ø§Ù +ا٠سا٠+دÙگر +گرÙÙÙ +Ø´Ø¯ÙØ¯ +ÚØ·Ùر +د٠+Ù +د٠+ÙØ®Ø³ØªÙÙ +ÙÙÙ +ÚØ±Ø§ +ÚÙ +ÙØ³Ø· +Ù +ÙØ¯Ø§Ù +ÙØ§Ø¨Ù +ÙÙ +Ø±ÙØª +ÙÙØª +ÙÙ ÚÙÙÙ +در +ÙØ²Ø§Ø± +بÙÙ +بÙÙ +Ø´Ø§ÙØ¯ +ا٠ا +Ø´ÙØ§Ø³Ù +Ú¯Ø±ÙØªÙ +Ø¯ÙØ¯ +داشت٠+Ø¯Ø§ÙØ³Øª +داشت٠+Ø®ÙØ§ÙÙÙ +Ù ÙÙÙØ§Ø±Ø¯ +ÙÙØªÙÙÙ +ا٠د +Ø®ÙØ§Ùد +جز +Ø§ÙØ±Ø¯Ù +شد٠+بÙÙÙ +خد٠ات +شد٠+برخ٠+ÙØ¨Ùد +Ø¨Ø³ÙØ§Ø±Ù +جÙÙÚ¯ÙØ±Ù +ØÙ +ÙØ±Ø¯Ùد +ÙÙØ¹Ù +بعر٠+ÙÙØ±Ø¯Ù +ÙØ¸Ùر +ÙØ¨Ø§Ùد +Ø¨ÙØ¯Ù +Ø¨ÙØ¯Ù +داد +Ø§ÙØ±Ø¯ +ÙØ³Øª +جاÙÙ +Ø´ÙØ¯ +Ø¯ÙØ¨Ø§Ù +داد٠+Ø¨Ø§ÙØ¯ +ساب٠+ÙÙÚ +Ù٠ا٠+Ø§ÙØ¬Ø§ +Ù٠تر +ÙØ¬Ø§Ø³Øª +گردد +ÙØ³Ù +تر +٠رد٠+تا٠+داد٠+Ø¨ÙØ¯Ùد +سر٠+جدا +ÙØ¯Ø§Ø±Ùد +٠گر +ÙÙØ¯Ùگر +دارد +دÙÙØ¯ +Ø¨ÙØ§Ø¨Ø±Ø§ÙÙ +ÙÙگا٠٠+س٠ت +جا +اÙÚÙ +Ø®ÙØ¯ +Ø¯Ø§Ø¯ÙØ¯ +Ø²ÙØ§Ø¯ +Ø¯Ø§Ø±ÙØ¯ +اثر +بدÙÙ +Ø¨ÙØªØ±ÙÙ +Ø¨ÙØ´ØªØ± +Ø§ÙØ¨ØªÙ +ب٠+براساس +Ø¨ÙØ±ÙÙ +ÙØ±Ø¯ +بعض٠+Ú¯Ø±ÙØª +تÙÙ +ا٠+Ù ÙÙÙÙÙ +ا٠+Ø¬Ø±ÙØ§Ù +تÙÙ +بر +٠اÙÙØ¯ +برابر +باشÙÙ +٠دت٠+Ú¯ÙÙÙØ¯ +اÙÙÙÙ +تا +تÙÙØ§ +Ø¬Ø¯ÙØ¯ +ÚÙØ¯ +ب٠+ÙØ´Ø¯Ù +ÙØ±Ø¯Ù +ÙØ±Ø¯Ù +Ú¯ÙÙØ¯ +ÙØ±Ø¯Ù +ÙÙÙÙ +ÙÙ Ù +ÙØ²Ø¯ +رÙÙ +ÙØµØ¯ +ÙÙØ· +Ø¨Ø§ÙØ§Ù +دÙگرا٠+اÙÙ +Ø¯ÙØ±Ùز +ØªÙØ³Ø· +سÙÙ +اÙÙ +داÙÙØ¯ +سÙÙ +Ø§Ø³ØªÙØ§Ø¯Ù +ش٠ا +ÙÙØ§Ø± +دارÙÙ +ساخت٠+Ø·ÙØ± +ا٠د٠+Ø±ÙØªÙ +ÙØ®Ø³Øª +Ø¨ÙØ³Øª +ÙØ²Ø¯ÙÙ +Ø·Ù +ÙÙÙØ¯ +از +اÙÙØ§ +ت٠ا٠٠+داشت +ÙÙÙ +طرÙÙ +اش +ÚÙØ³Øª +Ø±ÙØ¨ +ÙÙ Ø§ÙØ¯ +Ú¯ÙØª +ÚÙØ¯ÙÙ +ÚÙØ²Ù +ØªÙØ§Ùد +ا٠+Ø§ÙØ§ +با +ا٠+Ø§ÙØ¯ +ترÙÙ +اÙÙÙÙ +دÙگر٠+را٠+ÙØ§ÙÙ +Ø¨Ø±ÙØ² +ÙÙ ÚÙØ§Ù +پاعÙÙ +ÙØ³ +ØØ¯Ùد +٠ختÙÙ +Ù ÙØ§Ø¨Ù +ÚÙØ² +Ú¯ÙØ±Ø¯ +ÙØ¯Ø§Ø±Ø¯ +ضد +ÙÙ ÚÙÙ +ساز٠+شا٠+Ù ÙØ±Ø¯ +بار٠+٠رس٠+Ø®ÙÙØ´ +Ø¨Ø±Ø®ÙØ±Ø¯Ø§Ø± +ÚÙÙ +خارج +شش +ÙÙÙØ² +ØªØØª +ض٠٠+ÙØ³ØªÙÙ +Ú¯ÙØªÙ +ÙÙØ± +Ø¨Ø³ÙØ§Ø± +Ù¾ÙØ´ +برا٠+Ø±ÙØ²Ùا٠+اÙÙÙ +ÙØ®ÙØ§ÙØ¯ +Ø¨Ø§ÙØ§ +ÙÙ +ÙÙØªÙ +ÙÙ +ÚÙÙÙ +ÙÙ +Ú¯ÙØ±Ù +ÙÙØ³Øª +است +ÙØ¬Ø§ +ÙÙØ¯ +ÙÙØ² +ÙØ§Ø¨Ø¯ +Ø¨ÙØ¯Ù +ØØªÙ +ØªÙØ§ÙÙØ¯ +Ø¹ÙØ¨ +Ø®ÙØ§Ø³Øª +ÙÙÙØ¯ +بÙÙ +ت٠ا٠+ÙÙ Ù +٠ا +Ø¨Ø§Ø´ÙØ¯ +٠ث٠+شد +ار٠+باشد +ار٠+طب٠+بعد +اگر +ØµÙØ±Øª +ØºÙØ± +جا٠+Ø¨ÙØ´ +Ø±ÙØ²Ù +Ø§ÙØ¯ +Ø²ÙØ±Ø§ +ÚÚ¯ÙÙÙ +بار +ÙØ·Ùا +Ù Ù +دربار٠+Ù Ù +Ø¯ÙØ¯Ù +ÙÙ ÙÙ +گذار٠+بردار٠+Ø¹ÙØª +گذاشت٠+ÙÙ +ÙÙÙ +ÙÙ +ÙØ§ +Ø´ÙÙØ¯ +اباد +ÙÙ ÙØ§Ø±Ù +ÙØ± +اÙÙ +Ø®ÙØ§ÙÙØ¯ +ÚÙØ§Ø± +ÙØ§Ù +Ø§Ù Ø±ÙØ² +٠ا٠+ÙØ§Ù +ÙØ¨Ù +ÙÙÙ +سع٠+تاز٠+را +ÙØ³ØªÙد +Ø²ÙØ± +جÙÙÙ +عÙÙØ§Ù +Ø¨ÙØ¯ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fi.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fi.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fi.txt new file mode 100644 index 0000000..4372c9a --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fi.txt @@ -0,0 +1,97 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + +| forms of BE + +olla +olen +olet +on +olemme +olette +ovat +ole | negative form + +oli +olisi +olisit +olisin +olisimme +olisitte +olisivat +olit +olin +olimme +olitte +olivat +ollut +olleet + +en | negation +et +ei +emme +ette +eivät + +|Nom Gen Acc Part Iness Elat Illat Adess Ablat Allat Ess Trans +minä minun minut minua minussa minusta minuun minulla minulta minulle | I +sinä sinun sinut sinua sinussa sinusta sinuun sinulla sinulta sinulle | you +hän hänen hänet häntä hänessä hänestä häneen hänellä häneltä hänelle | he she +me meidän meidät meitä meissä meistä meihin meillä meiltä meille | we +te teidän teidät teitä teissä teistä teihin teillä teiltä teille | you +he heidän heidät heitä heissä heistä heihin heillä heiltä heille | they + +tämä tämän tätä tässä tästä tähän tallä tältä tälle tänä täksi | this +tuo tuon tuotä tuossa tuosta tuohon tuolla tuolta tuolle tuona tuoksi | that +se sen sitä siinä siitä siihen sillä siltä sille sinä siksi | it +nämä näiden näitä näissä näistä näihin näillä näiltä näille näinä näiksi | these +nuo noiden noita noissa noista noihin noilla noilta noille noina noiksi | those +ne niiden niitä niissä niistä niihin niillä niiltä niille niinä niiksi | they + +kuka kenen kenet ketä kenessä kenestä keneen kenellä keneltä kenelle kenenä keneksi| who +ketkä keiden ketkä keitä keissä keistä keihin keillä keiltä keille keinä keiksi | (pl) +mikä minkä minkä mitä missä mistä mihin millä miltä mille minä miksi | which what +mitkä | (pl) + +joka jonka jota jossa josta johon jolla jolta jolle jona joksi | who which +jotka joiden joita joissa joista joihin joilla joilta joille joina joiksi | (pl) + +| conjunctions + +että | that +ja | and +jos | if +koska | because +kuin | than +mutta | but +niin | so +sekä | and +sillä | for +tai | or +vaan | but +vai | or +vaikka | although + + +| prepositions + +kanssa | with +mukaan | according to +noin | about +poikki | across +yli | over, across + +| other + +kun | when +niin | so +nyt | now +itse | self + http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fr.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fr.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fr.txt new file mode 100644 index 0000000..749abae --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_fr.txt @@ -0,0 +1,186 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A French stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + +au | a + le +aux | a + les +avec | with +ce | this +ces | these +dans | with +de | of +des | de + les +du | de + le +elle | she +en | `of them' etc +et | and +eux | them +il | he +je | I +la | the +le | the +leur | their +lui | him +ma | my (fem) +mais | but +me | me +même | same; as in moi-même (myself) etc +mes | me (pl) +moi | me +mon | my (masc) +ne | not +nos | our (pl) +notre | our +nous | we +on | one +ou | where +par | by +pas | not +pour | for +qu | que before vowel +que | that +qui | who +sa | his, her (fem) +se | oneself +ses | his (pl) +son | his, her (masc) +sur | on +ta | thy (fem) +te | thee +tes | thy (pl) +toi | thee +ton | thy (masc) +tu | thou +un | a +une | a +vos | your (pl) +votre | your +vous | you + + | single letter forms + +c | c' +d | d' +j | j' +l | l' +à | to, at +m | m' +n | n' +s | s' +t | t' +y | there + + | forms of être (not including the infinitive): +été +étée +étées +étés +étant +suis +es +est +sommes +êtes +sont +serai +seras +sera +serons +serez +seront +serais +serait +serions +seriez +seraient +étais +était +étions +étiez +étaient +fus +fut +fûmes +fûtes +furent +sois +soit +soyons +soyez +soient +fusse +fusses +fût +fussions +fussiez +fussent + + | forms of avoir (not including the infinitive): +ayant +eu +eue +eues +eus +ai +as +avons +avez +ont +aurai +auras +aura +aurons +aurez +auront +aurais +aurait +aurions +auriez +auraient +avais +avait +avions +aviez +avaient +eut +eûmes +eûtes +eurent +aie +aies +ait +ayons +ayez +aient +eusse +eusses +eût +eussions +eussiez +eussent + + | Later additions (from Jean-Christophe Deschamps) +ceci | this +cela | that +celà | that +cet | this +cette | this +ici | here +ils | they +les | the (pl) +leurs | their (pl) +quel | which +quels | which +quelle | which +quelles | which +sans | without +soi | oneself + http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ga.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ga.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ga.txt new file mode 100644 index 0000000..9ff88d7 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ga.txt @@ -0,0 +1,110 @@ + +a +ach +ag +agus +an +aon +ar +arna +as +b' +ba +beirt +bhúr +caoga +ceathair +ceathrar +chomh +chtó +chuig +chun +cois +céad +cúig +cúigear +d' +daichead +dar +de +deich +deichniúr +den +dhá +do +don +dtà +dá +dár +dó +faoi +faoin +faoina +faoinár +fara +fiche +gach +gan +go +gur +haon +hocht +i +iad +idir +in +ina +ins +inár +is +le +leis +lena +lenár +m' +mar +mo +mé +na +nach +naoi +naonúr +ná +nà +nÃor +nó +nócha +ocht +ochtar +os +roimh +sa +seacht +seachtar +seachtó +seasca +seisear +siad +sibh +sinn +sna +sé +sà +tar +thar +thú +triúr +trà +trÃna +trÃnár +trÃocha +tú +um +ár +é +éis +à +ó +ón +óna +ónár http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_gl.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_gl.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_gl.txt new file mode 100644 index 0000000..d8760b1 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_gl.txt @@ -0,0 +1,161 @@ +# galican stopwords +a +aÃnda +alà +aquel +aquela +aquelas +aqueles +aquilo +aquà +ao +aos +as +asà +á +ben +cando +che +co +coa +comigo +con +connosco +contigo +convosco +coas +cos +cun +cuns +cunha +cunhas +da +dalgunha +dalgunhas +dalgún +dalgúns +das +de +del +dela +delas +deles +desde +deste +do +dos +dun +duns +dunha +dunhas +e +el +ela +elas +eles +en +era +eran +esa +esas +ese +eses +esta +estar +estaba +está +están +este +estes +estiven +estou +eu +é +facer +foi +foron +fun +habÃa +hai +iso +isto +la +las +lle +lles +lo +los +mais +me +meu +meus +min +miña +miñas +moi +na +nas +neste +nin +no +non +nos +nosa +nosas +noso +nosos +nós +nun +nunha +nuns +nunhas +o +os +ou +ó +ós +para +pero +pode +pois +pola +polas +polo +polos +por +que +se +senón +ser +seu +seus +sexa +sido +sobre +súa +súas +tamén +tan +te +ten +teñen +teño +ter +teu +teus +ti +tido +tiña +tiven +túa +túas +un +unha +unhas +uns +vos +vosa +vosas +voso +vosos +vós http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hi.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hi.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hi.txt new file mode 100644 index 0000000..86286bb --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hi.txt @@ -0,0 +1,235 @@ +# Also see http://www.opensource.org/licenses/bsd-license.html +# See http://members.unine.ch/jacques.savoy/clef/index.html. +# This file was created by Jacques Savoy and is distributed under the BSD license. +# Note: by default this file also contains forms normalized by HindiNormalizer +# for spelling variation (see section below), such that it can be used whether or +# not you enable that feature. When adding additional entries to this list, +# please add the normalized form as well. +ठà¤à¤¦à¤° +ठत +ठपना +ठपनॠ+ठपनॠ+ठà¤à¥ +à¤à¤¦à¤¿ +à¤à¤ª +à¤à¤¤à¥à¤¯à¤¾à¤¦à¤¿ +à¤à¤¨ +à¤à¤¨à¤à¤¾ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¸ +à¤à¤¸à¤à¤¾ +à¤à¤¸à¤à¥ +à¤à¤¸à¤à¥ +à¤à¤¸à¤®à¥à¤ +à¤à¤¸à¥ +à¤à¤¸à¥ +à¤à¤¨ +à¤à¤¨à¤à¤¾ +à¤à¤¨à¤à¥ +à¤à¤¨à¤à¥ +à¤à¤¨à¤à¥ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¨à¥à¤¹à¥à¤ +à¤à¤¸ +à¤à¤¸à¤à¥ +à¤à¤¸à¥ +à¤à¤¸à¥ +à¤à¤ +à¤à¤µà¤ +à¤à¤¸ +à¤à¤¸à¥ +à¤à¤° +à¤à¤ +à¤à¤° +à¤à¤°à¤¤à¤¾ +à¤à¤°à¤¤à¥ +à¤à¤°à¤¨à¤¾ +à¤à¤°à¤¨à¥ +à¤à¤°à¥à¤ +à¤à¤¹à¤¤à¥ +à¤à¤¹à¤¾ +à¤à¤¾ +à¤à¤¾à¥à¥ +à¤à¤¿ +à¤à¤¿à¤¤à¤¨à¤¾ +à¤à¤¿à¤¨à¥à¤¹à¥à¤ +à¤à¤¿à¤¨à¥à¤¹à¥à¤ +à¤à¤¿à¤¯à¤¾ +à¤à¤¿à¤° +à¤à¤¿à¤¸ +à¤à¤¿à¤¸à¥ +à¤à¤¿à¤¸à¥ +à¤à¥ +à¤à¥à¤ +à¤à¥à¤² +à¤à¥ +à¤à¥ +à¤à¥à¤ +à¤à¥à¤¨ +à¤à¥à¤¨à¤¸à¤¾ +à¤à¤¯à¤¾ +à¤à¤° +à¤à¤¬ +à¤à¤¹à¤¾à¤ +à¤à¤¾ +à¤à¤¿à¤¤à¤¨à¤¾ +à¤à¤¿à¤¨ +à¤à¤¿à¤¨à¥à¤¹à¥à¤ +à¤à¤¿à¤¨à¥à¤¹à¥à¤ +à¤à¤¿à¤¸ +à¤à¤¿à¤¸à¥ +à¤à¥à¤§à¤° +à¤à¥à¤¸à¤¾ +à¤à¥à¤¸à¥ +à¤à¥ +तठ+तब +तरह +तिन +तिनà¥à¤¹à¥à¤ +तिनà¥à¤¹à¥à¤ +तिस +तिसॠ+तॠ+था +थॠ+थॠ+दबारा +दिया +दà¥à¤¸à¤°à¤¾ +दà¥à¤¸à¤°à¥ +दॠ+दà¥à¤µà¤¾à¤°à¤¾ +न +नहà¥à¤ +ना +निहायत +नà¥à¤à¥ +नॠ+पर +पर +पहलॠ+पà¥à¤°à¤¾ +पॠ+फिर +बनॠ+बहॠ+बहà¥à¤¤ +बाद +बाला +बिलà¤à¥à¤² +à¤à¥ +à¤à¥à¤¤à¤° +मà¤à¤° +मानॠ+मॠ+मà¥à¤ +यदि +यह +यहाठ+यहॠ+या +यिह +यॠ+रà¤à¥à¤ +रहा +रहॠ+ऱà¥à¤µà¤¾à¤¸à¤¾ +लिठ+लियॠ+लà¥à¤à¤¿à¤¨ +व +वरà¥à¤ +वह +वह +वहाठ+वहà¥à¤ +वालॠ+वà¥à¤¹ +वॠ+वà¥à¥à¤°à¤¹ +सà¤à¤ +सà¤à¤¤à¤¾ +सà¤à¤¤à¥ +सबसॠ+सà¤à¥ +साथ +साबà¥à¤¤ +साठ+सारा +सॠ+सॠ+हॠ+हà¥à¤ +हà¥à¤ +हà¥à¤ +हॠ+हà¥à¤ +हॠ+हà¥à¤¤à¤¾ +हà¥à¤¤à¥ +हà¥à¤¤à¥ +हà¥à¤¨à¤¾ +हà¥à¤¨à¥ +# additional normalized forms of the above +ठपनि +à¤à¥à¤¸à¥ +हà¥à¤¤à¤¿ +सà¤à¤¿ +तिà¤à¤¹à¥à¤ +à¤à¤à¤¹à¥à¤ +दवारा +à¤à¤¸à¤¿ +à¤à¤¿à¤à¤¹à¥à¤ +थि +à¤à¤à¤¹à¥à¤ +à¤à¤° +à¤à¤¿à¤à¤¹à¥à¤ +वहिठ+ठà¤à¤¿ +बनि +हि +à¤à¤à¤¹à¤¿à¤ +à¤à¤à¤¹à¥à¤ +हà¥à¤ +वà¤à¥à¤°à¤¹ +à¤à¤¸à¥ +रवासा +à¤à¥à¤¨ +निà¤à¥ +à¤à¤¾à¤«à¤¿ +à¤à¤¸à¤¿ +पà¥à¤°à¤¾ +à¤à¤¿à¤¤à¤° +हॠ+बहि +वहाठ+à¤à¥à¤ +यहाठ+à¤à¤¿à¤à¤¹à¥à¤ +तिà¤à¤¹à¥à¤ +à¤à¤¿à¤¸à¤¿ +à¤à¤ +यहि +à¤à¤à¤¹à¤¿à¤ +à¤à¤¿à¤§à¤° +à¤à¤à¤¹à¥à¤ +ठदि +à¤à¤¤à¤¯à¤¾à¤¦à¤¿ +हà¥à¤ +à¤à¥à¤¨à¤¸à¤¾ +à¤à¤¸à¤à¤¿ +दà¥à¤¸à¤°à¥ +à¤à¤¹à¤¾à¤ +ठप +à¤à¤¿à¤à¤¹à¥à¤ +à¤à¤¨à¤à¤¿ +à¤à¤¿ +वरठ+हà¥à¤ +à¤à¥à¤¸à¤¾ +नहिठhttp://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hu.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hu.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hu.txt new file mode 100644 index 0000000..37526da --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hu.txt @@ -0,0 +1,211 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + +| Hungarian stop word list +| prepared by Anna Tordai + +a +ahogy +ahol +aki +akik +akkor +alatt +által +általában +amely +amelyek +amelyekben +amelyeket +amelyet +amelynek +ami +amit +amolyan +amÃg +amikor +át +abban +ahhoz +annak +arra +arról +az +azok +azon +azt +azzal +azért +aztán +azután +azonban +bár +be +belül +benne +cikk +cikkek +cikkeket +csak +de +e +eddig +egész +egy +egyes +egyetlen +egyéb +egyik +egyre +ekkor +el +elég +ellen +elÅ +elÅször +elÅtt +elsÅ +én +éppen +ebben +ehhez +emilyen +ennek +erre +ez +ezt +ezek +ezen +ezzel +ezért +és +fel +felé +hanem +hiszen +hogy +hogyan +igen +Ãgy +illetve +ill. +ill +ilyen +ilyenkor +ison +ismét +itt +jó +jól +jobban +kell +kellett +keresztül +keressünk +ki +kÃvül +között +közül +legalább +lehet +lehetett +legyen +lenne +lenni +lesz +lett +maga +magát +majd +majd +már +más +másik +meg +még +mellett +mert +mely +melyek +mi +mit +mÃg +miért +milyen +mikor +minden +mindent +mindenki +mindig +mint +mintha +mivel +most +nagy +nagyobb +nagyon +ne +néha +nekem +neki +nem +néhány +nélkül +nincs +olyan +ott +össze +Å +Åk +Åket +pedig +persze +rá +s +saját +sem +semmi +sok +sokat +sokkal +számára +szemben +szerint +szinte +talán +tehát +teljes +tovább +továbbá +több +úgy +ugyanis +új +újabb +újra +után +utána +utolsó +vagy +vagyis +valaki +valami +valamint +való +vagyok +van +vannak +volt +voltam +voltak +voltunk +vissza +vele +viszont +volna http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hy.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hy.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hy.txt new file mode 100644 index 0000000..60c1c50 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_hy.txt @@ -0,0 +1,46 @@ +# example set of Armenian stopwords. +Õ¡ÕµÕ¤ +Õ¡ÕµÕ¬ +Õ¡ÕµÕ¶ +Õ¡ÕµÕ½ +Õ¤Õ¸Ö +Õ¤Õ¸ÖÖ +Õ¥Õ´ +Õ¥Õ¶ +Õ¥Õ¶Ö +Õ¥Õ½ +Õ¥Ö +Õ§ +Õ§Õ« +Õ§Õ«Õ¶ +Õ§Õ«Õ¶Ö +Õ§Õ«Ö +Õ§Õ«Ö +Õ§Ö +Õ¨Õ½Õ¿ +Õ© +Õ« +Õ«Õ¶ +Õ«Õ½Õ¯ +Õ«Ö +Õ¯Õ¡Õ´ +Õ°Õ¡Õ´Õ¡Ö +Õ°Õ¥Õ¿ +Õ°Õ¥Õ¿Õ¸ +Õ´Õ¥Õ¶Ö +Õ´Õ¥Õ» +Õ´Õ« +Õ¶ +Õ¶Õ¡ +Õ¶Õ¡Ö +Õ¶ÖÕ¡ +Õ¶ÖÕ¡Õ¶Ö +Õ¸Ö +Õ¸ÖÕ¨ +Õ¸ÖÕ¸Õ¶Ö +Õ¸ÖÕºÕ¥Õ½ +Õ¸Ö +Õ¸ÖÕ´ +ÕºÕ«Õ¿Õ« +Õ¾ÖÕ¡ +Ö http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_id.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_id.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_id.txt new file mode 100644 index 0000000..4617f83 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_id.txt @@ -0,0 +1,359 @@ +# from appendix D of: A Study of Stemming Effects on Information +# Retrieval in Bahasa Indonesia +ada +adanya +adalah +adapun +agak +agaknya +agar +akan +akankah +akhirnya +aku +akulah +amat +amatlah +anda +andalah +antar +diantaranya +antara +antaranya +diantara +apa +apaan +mengapa +apabila +apakah +apalagi +apatah +atau +ataukah +ataupun +bagai +bagaikan +sebagai +sebagainya +bagaimana +bagaimanapun +sebagaimana +bagaimanakah +bagi +bahkan +bahwa +bahwasanya +sebaliknya +banyak +sebanyak +beberapa +seberapa +begini +beginian +beginikah +beginilah +sebegini +begitu +begitukah +begitulah +begitupun +sebegitu +belum +belumlah +sebelum +sebelumnya +sebenarnya +berapa +berapakah +berapalah +berapapun +betulkah +sebetulnya +biasa +biasanya +bila +bilakah +bisa +bisakah +sebisanya +boleh +bolehkah +bolehlah +buat +bukan +bukankah +bukanlah +bukannya +cuma +percuma +dahulu +dalam +dan +dapat +dari +daripada +dekat +demi +demikian +demikianlah +sedemikian +dengan +depan +di +dia +dialah +dini +diri +dirinya +terdiri +dong +dulu +enggak +enggaknya +entah +entahlah +terhadap +terhadapnya +hal +hampir +hanya +hanyalah +harus +haruslah +harusnya +seharusnya +hendak +hendaklah +hendaknya +hingga +sehingga +ia +ialah +ibarat +ingin +inginkah +inginkan +ini +inikah +inilah +itu +itukah +itulah +jangan +jangankan +janganlah +jika +jikalau +juga +justru +kala +kalau +kalaulah +kalaupun +kalian +kami +kamilah +kamu +kamulah +kan +kapan +kapankah +kapanpun +dikarenakan +karena +karenanya +ke +kecil +kemudian +kenapa +kepada +kepadanya +ketika +seketika +khususnya +kini +kinilah +kiranya +sekiranya +kita +kitalah +kok +lagi +lagian +selagi +lah +lain +lainnya +melainkan +selaku +lalu +melalui +terlalu +lama +lamanya +selama +selama +selamanya +lebih +terlebih +bermacam +macam +semacam +maka +makanya +makin +malah +malahan +mampu +mampukah +mana +manakala +manalagi +masih +masihkah +semasih +masing +mau +maupun +semaunya +memang +mereka +merekalah +meski +meskipun +semula +mungkin +mungkinkah +nah +namun +nanti +nantinya +nyaris +oleh +olehnya +seorang +seseorang +pada +padanya +padahal +paling +sepanjang +pantas +sepantasnya +sepantasnyalah +para +pasti +pastilah +per +pernah +pula +pun +merupakan +rupanya +serupa +saat +saatnya +sesaat +saja +sajalah +saling +bersama +sama +sesama +sambil +sampai +sana +sangat +sangatlah +saya +sayalah +se +sebab +sebabnya +sebuah +tersebut +tersebutlah +sedang +sedangkan +sedikit +sedikitnya +segala +segalanya +segera +sesegera +sejak +sejenak +sekali +sekalian +sekalipun +sesekali +sekaligus +sekarang +sekarang +sekitar +sekitarnya +sela +selain +selalu +seluruh +seluruhnya +semakin +sementara +sempat +semua +semuanya +sendiri +sendirinya +seolah +seperti +sepertinya +sering +seringnya +serta +siapa +siapakah +siapapun +disini +disinilah +sini +sinilah +sesuatu +sesuatunya +suatu +sesudah +sesudahnya +sudah +sudahkah +sudahlah +supaya +tadi +tadinya +tak +tanpa +setelah +telah +tentang +tentu +tentulah +tentunya +tertentu +seterusnya +tapi +tetapi +setiap +tiap +setidaknya +tidak +tidakkah +tidaklah +toh +waduh +wah +wahai +sewaktu +walau +walaupun +wong +yaitu +yakni +yang http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_it.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_it.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_it.txt new file mode 100644 index 0000000..1219cc7 --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_it.txt @@ -0,0 +1,303 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | An Italian stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + +ad | a (to) before vowel +al | a + il +allo | a + lo +ai | a + i +agli | a + gli +all | a + l' +agl | a + gl' +alla | a + la +alle | a + le +con | with +col | con + il +coi | con + i (forms collo, cogli etc are now very rare) +da | from +dal | da + il +dallo | da + lo +dai | da + i +dagli | da + gli +dall | da + l' +dagl | da + gll' +dalla | da + la +dalle | da + le +di | of +del | di + il +dello | di + lo +dei | di + i +degli | di + gli +dell | di + l' +degl | di + gl' +della | di + la +delle | di + le +in | in +nel | in + el +nello | in + lo +nei | in + i +negli | in + gli +nell | in + l' +negl | in + gl' +nella | in + la +nelle | in + le +su | on +sul | su + il +sullo | su + lo +sui | su + i +sugli | su + gli +sull | su + l' +sugl | su + gl' +sulla | su + la +sulle | su + le +per | through, by +tra | among +contro | against +io | I +tu | thou +lui | he +lei | she +noi | we +voi | you +loro | they +mio | my +mia | +miei | +mie | +tuo | +tua | +tuoi | thy +tue | +suo | +sua | +suoi | his, her +sue | +nostro | our +nostra | +nostri | +nostre | +vostro | your +vostra | +vostri | +vostre | +mi | me +ti | thee +ci | us, there +vi | you, there +lo | him, the +la | her, the +li | them +le | them, the +gli | to him, the +ne | from there etc +il | the +un | a +uno | a +una | a +ma | but +ed | and +se | if +perché | why, because +anche | also +come | how +dov | where (as dov') +dove | where +che | who, that +chi | who +cui | whom +non | not +più | more +quale | who, that +quanto | how much +quanti | +quanta | +quante | +quello | that +quelli | +quella | +quelle | +questo | this +questi | +questa | +queste | +si | yes +tutto | all +tutti | all + + | single letter forms: + +a | at +c | as c' for ce or ci +e | and +i | the +l | as l' +o | or + + | forms of avere, to have (not including the infinitive): + +ho +hai +ha +abbiamo +avete +hanno +abbia +abbiate +abbiano +avrò +avrai +avrà +avremo +avrete +avranno +avrei +avresti +avrebbe +avremmo +avreste +avrebbero +avevo +avevi +aveva +avevamo +avevate +avevano +ebbi +avesti +ebbe +avemmo +aveste +ebbero +avessi +avesse +avessimo +avessero +avendo +avuto +avuta +avuti +avute + + | forms of essere, to be (not including the infinitive): +sono +sei +è +siamo +siete +sia +siate +siano +sarò +sarai +sarà +saremo +sarete +saranno +sarei +saresti +sarebbe +saremmo +sareste +sarebbero +ero +eri +era +eravamo +eravate +erano +fui +fosti +fu +fummo +foste +furono +fossi +fosse +fossimo +fossero +essendo + + | forms of fare, to do (not including the infinitive, fa, fat-): +faccio +fai +facciamo +fanno +faccia +facciate +facciano +farò +farai +farà +faremo +farete +faranno +farei +faresti +farebbe +faremmo +fareste +farebbero +facevo +facevi +faceva +facevamo +facevate +facevano +feci +facesti +fece +facemmo +faceste +fecero +facessi +facesse +facessimo +facessero +facendo + + | forms of stare, to be (not including the infinitive): +sto +stai +sta +stiamo +stanno +stia +stiate +stiano +starò +starai +starà +staremo +starete +staranno +starei +staresti +starebbe +staremmo +stareste +starebbero +stavo +stavi +stava +stavamo +stavate +stavano +stetti +stesti +stette +stemmo +steste +stettero +stessi +stesse +stessimo +stessero +stando http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ja.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ja.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ja.txt new file mode 100644 index 0000000..d4321be --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_ja.txt @@ -0,0 +1,127 @@ +# +# This file defines a stopword set for Japanese. +# +# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia. +# Punctuation characters and frequent kanji have mostly been left out. See LUCENE-3745 +# for frequency lists, etc. that can be useful for making your own set (if desired) +# +# Note that there is an overlap between these stopwords and the terms stopped when used +# in combination with the JapanesePartOfSpeechStopFilter. When editing this file, note +# that comments are not allowed on the same line as stopwords. +# +# Also note that stopping is done in a case-insensitive manner. Change your StopFilter +# configuration if you need case-sensitive stopping. Lastly, note that stopping is done +# using the same character width as the entries in this file. Since this StopFilter is +# normally done after a CJKWidthFilter in your chain, you would usually want your romaji +# entries to be in half-width and your kana entries to be in full-width. +# +ã® +ã« +㯠+ã +ã +ã +ã§ +㦠+㨠+ã +ã +ã +ãã +ãã +ã +ãã +ãã +㪠+ã㨠+ã¨ã㦠+ã +ã +ãã +ãªã© +ãªã£ +ãªã +ãã® +ãã +ãã® +ã㣠+ãã +ã¾ã +ãã® +ã¨ãã +ãã +ã¾ã§ +ãã +ãªã +㸠+ã +ã +ãã +ã«ãã£ã¦ +ã«ãã +ãã +ãã +ã«ãã +ã +ãªã +ããã +ã«ãã㦠+ã° +ãªã㣠+ãªã +ããã +ã«ã¤ã㦠+ã +ã 㣠+ãã®å¾ +ã§ãã +ãã +ã +ã®ã§ +ãªã +ã®ã¿ +ã§ã +ã +㤠+ã«ããã +ããã³ +ãã +ããã« +ã§ã +ã +ãã +ãã®ä» +ã«é¢ãã +ãã¡ +ã¾ã +ã +ãªã +ã«å¯¾ã㦠+ç¹ã« +ãã +åã³ +ããã +ã¨ã +ã§ã¯ +ã«ã¦ +ã»ã +ãªãã +ãã¡ +ãã㦠+ã¨ã¨ãã« +ãã ã +ãã¤ã¦ +ãããã +ã¾ã㯠+ã +ã»ã© +ãã®ã® +ã«å¯¾ãã +ã»ã¨ãã© +ã¨å ±ã« +ã¨ãã£ã +ã§ã +ã¨ã +ã¨ãã +ãã +##### End of file http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_lv.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_lv.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_lv.txt new file mode 100644 index 0000000..e21a23c --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_lv.txt @@ -0,0 +1,172 @@ +# Set of Latvian stopwords from A Stemming Algorithm for Latvian, Karlis Kreslins +# the original list of over 800 forms was refined: +# pronouns, adverbs, interjections were removed +# +# prepositions +aiz +ap +ar +apakÅ¡ +Ärpus +augÅ¡pus +bez +caur +dÄļ +gar +iekÅ¡ +iz +kopÅ¡ +labad +lejpus +lÄ«dz +no +otrpus +pa +par +pÄr +pÄc +pie +pirms +pret +priekÅ¡ +starp +Å¡aipus +uz +viÅpus +virs +virspus +zem +apakÅ¡pus +# Conjunctions +un +bet +jo +ja +ka +lai +tomÄr +tikko +turpretÄ« +arÄ« +kaut +gan +tÄdÄļ +tÄ +ne +tikvien +vien +kÄ +ir +te +vai +kamÄr +# Particles +ar +diezin +droÅ¡i +diemžÄl +nebÅ«t +ik +it +taÄu +nu +pat +tiklab +iekÅ¡pus +nedz +tik +nevis +turpretim +jeb +iekam +iekÄm +iekÄms +kolÄ«dz +lÄ«dzko +tiklÄ«dz +jebÅ¡u +tÄlab +tÄpÄc +nekÄ +itin +jÄ +jau +jel +nÄ +nezin +tad +tikai +vis +tak +iekams +vien +# modal verbs +bÅ«t +biju +biji +bija +bijÄm +bijÄt +esmu +esi +esam +esat +būšu +bÅ«si +bÅ«s +bÅ«sim +bÅ«siet +tikt +tiku +tiki +tika +tikÄm +tikÄt +tieku +tiec +tiek +tiekam +tiekat +tikÅ¡u +tiks +tiksim +tiksiet +tapt +tapi +tapÄt +topat +tapÅ¡u +tapsi +taps +tapsim +tapsiet +kļūt +kļuvu +kļuvi +kļuva +kļuvÄm +kļuvÄt +kļūstu +kļūsti +kļūst +kļūstam +kļūstat +kļūšu +kļūsi +kļūs +kļūsim +kļūsiet +# verbs +varÄt +varÄju +varÄjÄm +varÄÅ¡u +varÄsim +var +varÄji +varÄjÄt +varÄsi +varÄsiet +varat +varÄja +varÄs http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_nl.txt ---------------------------------------------------------------------- diff --git a/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_nl.txt b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_nl.txt new file mode 100644 index 0000000..47a2aea --- /dev/null +++ b/data-access/config/schemas/solr/nexustiles/conf/lang/stopwords_nl.txt @@ -0,0 +1,119 @@ + | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A Dutch stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + + | This is a ranked list (commonest to rarest) of stopwords derived from + | a large sample of Dutch text. + + | Dutch stop words frequently exhibit homonym clashes. These are indicated + | clearly below. + +de | the +en | and +van | of, from +ik | I, the ego +te | (1) chez, at etc, (2) to, (3) too +dat | that, which +die | that, those, who, which +in | in, inside +een | a, an, one +hij | he +het | the, it +niet | not, nothing, naught +zijn | (1) to be, being, (2) his, one's, its +is | is +was | (1) was, past tense of all persons sing. of 'zijn' (to be) (2) wax, (3) the washing, (4) rise of river +op | on, upon, at, in, up, used up +aan | on, upon, to (as dative) +met | with, by +als | like, such as, when +voor | (1) before, in front of, (2) furrow +had | had, past tense all persons sing. of 'hebben' (have) +er | there +maar | but, only +om | round, about, for etc +hem | him +dan | then +zou | should/would, past tense all persons sing. of 'zullen' +of | or, whether, if +wat | what, something, anything +mijn | possessive and noun 'mine' +men | people, 'one' +dit | this +zo | so, thus, in this way +door | through by +over | over, across +ze | she, her, they, them +zich | oneself +bij | (1) a bee, (2) by, near, at +ook | also, too +tot | till, until +je | you +mij | me +uit | out of, from +der | Old Dutch form of 'van der' still found in surnames +daar | (1) there, (2) because +haar | (1) her, their, them, (2) hair +naar | (1) unpleasant, unwell etc, (2) towards, (3) as +heb | present first person sing. of 'to have' +hoe | how, why +heeft | present third person sing. of 'to have' +hebben | 'to have' and various parts thereof +deze | this +u | you +want | (1) for, (2) mitten, (3) rigging +nog | yet, still +zal | 'shall', first and third person sing. of verb 'zullen' (will) +me | me +zij | she, they +nu | now +ge | 'thou', still used in Belgium and south Netherlands +geen | none +omdat | because +iets | something, somewhat +worden | to become, grow, get +toch | yet, still +al | all, every, each +waren | (1) 'were' (2) to wander, (3) wares, (3) +veel | much, many +meer | (1) more, (2) lake +doen | to do, to make +toen | then, when +moet | noun 'spot/mote' and present form of 'to must' +ben | (1) am, (2) 'are' in interrogative second person singular of 'to be' +zonder | without +kan | noun 'can' and present form of 'to be able' +hun | their, them +dus | so, consequently +alles | all, everything, anything +onder | under, beneath +ja | yes, of course +eens | once, one day +hier | here +wie | who +werd | imperfect third person sing. of 'become' +altijd | always +doch | yet, but etc +wordt | present third person sing. of 'become' +wezen | (1) to be, (2) 'been' as in 'been fishing', (3) orphans +kunnen | to be able +ons | us/our +zelf | self +tegen | against, towards, at +na | after, near +reeds | already +wil | (1) present tense of 'want', (2) 'will', noun, (3) fender +kon | could; past tense of 'to be able' +niets | nothing +uw | your +iemand | somebody +geweest | been; past participle of 'be' +andere | other
