Thanks Geert and Justin for the suggestions. What we are looking for is when 
user search for "tax asset", I expect to get match even in the source file even 
it is presented as  "taxasset". The problem is we don't know how many run-in 
words (problem ones) in the source and the changes should not impact the 
existing normal functionality.

Will looking into the suggested functions.

Thanks,

Yun

From: [email protected] 
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Friday, August 28, 2015 1:09 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] How to search run-in words?

Hi Yun,

I completely forgot about custom dictionaries (thnx Justin!). You can find more 
detail here: http://docs.marklogic.com/guide/search-dev/custom-dictionaries. 
But in a nutshell it allows you to create a dictionary file that should allow 
you to override stemming behavior of particular existing terms, and to learn 
the stemmer how to stem words it doesn't know yet.

Not entirely clear how you would use that to provide decompounding stemming, 
but it is worth a look at the least..

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Geert Josten 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Friday, August 28, 2015 at 7:52 AM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] How to search run-in words?

Hi Yun,

If it would have been real compound words (in Dutch `board game` is written as 
one word `bordspel`), you could have used decompounding stemming. But that 
would not work for misspelled words like below.

I imagine you would want to be able to search on `tax`, and find `tax asset`, 
right? The simplest solution would be to search with wildcards, like: `tax*`..

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of "Yang, Yun" 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Friday, August 28, 2015 at 6:13 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] How to search run-in words?

All,

Is there an easy way we can do the search on run-in words?  We have some files 
that the words run together like below. Can they be treated as the separate 
words?

Sample:

run-in words

should be

taxasset

tax asset

riseto

rise to

taxbenefit

tax benefit

anincome

an income

decreasefor

decrease for

fabricatorfor

fabricator for


Thanks,

Yun


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to