For those struggling around with d:propcontains, S:property-contains,
d:eq, S:propsearch, index a property as TEXT or String, etc etc....(most
people do not seem to be aware about S:propsearch at all and I normally
lack to inform everybody, I dug up a mail from me below from 5/30/2007
sent to this list, so nobody can say I did not share (and I even added
it to the wiki which I normally lack as well, [1])....perhaps I had to
start with a different first line back then :-) )

-Ard

[1] http://www.hippocms.org/display/CMS/06.+Using+DASL+Queries

> If you do not want to read this mail (though it is IMO very 
> intuitive and important stuff), conclusion is: *never* use 
> s:propcontains anymore if you ask me! Most likely, I will add 
> a new dasl search thing, that for backwards compatibility 
> won't override s:propcontains, but probably will be called 
> something like s:propsearch, and should be hundreds of times 
> faster (factor depends on how large the fields to search are 
> and how large the repo. the larger both, the larger the factor)
> 
> Explanation:
> 
> Most of you probably have seen s:propcontains and 
> s:property-contains. So, what is the difference?
> 
> A s:propcontains is done on properties that are indexed as 
> type="string" (though on a type="text" it also works). This 
> is the DEFAULT in which all properties are indexed.
> 
> A type="string" is stored in lucene as UN_TOKENIZED, in other 
> words, as one single term. There is obviously the reason to 
> do this to enable sorting on these properties. But, they 
> shouldn't be used for searching!! NEVER! 
> 
> Why? Since the property is indexed UN_TOKENIZED, a 
> propcontains is translated into a lucene WildcardQuery, and 
> this is slow (increasingly the larger the field, and the 
> larger the number of documents)
> 
> So for example, a 
> 
> <s:propcontains>
>      <dav:prop><cms:caption/></dav:prop>
>      <dav:literal>house</dav:literal>
> </s:propcontains>
> 
> is translated into a search for *article*. I think everybody 
> understands that WildcardQuery can never be fast, and also, 
> that they probably do not return you the wanted result (you 
> look for "house", but also, "houwewive" is found (ok, 
> "houseman" as well)).
> 
> OTH, s:property-contains only works on properties that are of 
> type="text". The main difference for indexing, is that 
> type="text" is TOKENIZED. It cannot be used properly for 
> sorting anymore. When you have 
> 
> <s:property-contains>
>      <dav:prop><cms:caption/></dav:prop>
>      <dav:literal>house</dav:literal>
> </s:property-contains>
> 
> it means that 
> 
> 1)cms:caption should be of type="text"
> 2)you are looking for the exact occurence of "house". 
> 
> In property-contains, you can use house* or *house*, which 
> will result in slow results as well, obviously. 
> s:property-contains is very well suited for searching for 
> example keywords, which are a commaseperated string, and 
> which you analysed with the commaseperatedanalyzer.
> 
> Now, since there is no proper way to search for words in 
> properties indexed as type="string", I will add a new one (if 
> people all vote +1) that will look like:
> 
> <s:propsearch>
>      <dav:prop><cms:caption/></dav:prop>
>      <dav:literal>house</dav:literal>
> </s:propsearch>
> 
> This one will (logically) have equivalent performance to the 
> d:contains operator. I do not want to change the 
> s:propcontains behavior because there might be legacy 
> applications build on its specific behavior. What I also 
> might want to add is an optional boost factor, to indicate 
> for example that keywords found in the caption are 10 times 
> more important then in the content, though i am not 100% sure 
> wether this can be added very easily
> 
> thx for reading untill here
> 
> Ard
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Reply via email to