For those struggling around with d:propcontains, S:property-contains, d:eq, S:propsearch, index a property as TEXT or String, etc etc....(most people do not seem to be aware about S:propsearch at all and I normally lack to inform everybody, I dug up a mail from me below from 5/30/2007 sent to this list, so nobody can say I did not share (and I even added it to the wiki which I normally lack as well, [1])....perhaps I had to start with a different first line back then :-) )
-Ard [1] http://www.hippocms.org/display/CMS/06.+Using+DASL+Queries > If you do not want to read this mail (though it is IMO very > intuitive and important stuff), conclusion is: *never* use > s:propcontains anymore if you ask me! Most likely, I will add > a new dasl search thing, that for backwards compatibility > won't override s:propcontains, but probably will be called > something like s:propsearch, and should be hundreds of times > faster (factor depends on how large the fields to search are > and how large the repo. the larger both, the larger the factor) > > Explanation: > > Most of you probably have seen s:propcontains and > s:property-contains. So, what is the difference? > > A s:propcontains is done on properties that are indexed as > type="string" (though on a type="text" it also works). This > is the DEFAULT in which all properties are indexed. > > A type="string" is stored in lucene as UN_TOKENIZED, in other > words, as one single term. There is obviously the reason to > do this to enable sorting on these properties. But, they > shouldn't be used for searching!! NEVER! > > Why? Since the property is indexed UN_TOKENIZED, a > propcontains is translated into a lucene WildcardQuery, and > this is slow (increasingly the larger the field, and the > larger the number of documents) > > So for example, a > > <s:propcontains> > <dav:prop><cms:caption/></dav:prop> > <dav:literal>house</dav:literal> > </s:propcontains> > > is translated into a search for *article*. I think everybody > understands that WildcardQuery can never be fast, and also, > that they probably do not return you the wanted result (you > look for "house", but also, "houwewive" is found (ok, > "houseman" as well)). > > OTH, s:property-contains only works on properties that are of > type="text". The main difference for indexing, is that > type="text" is TOKENIZED. It cannot be used properly for > sorting anymore. When you have > > <s:property-contains> > <dav:prop><cms:caption/></dav:prop> > <dav:literal>house</dav:literal> > </s:property-contains> > > it means that > > 1)cms:caption should be of type="text" > 2)you are looking for the exact occurence of "house". > > In property-contains, you can use house* or *house*, which > will result in slow results as well, obviously. > s:property-contains is very well suited for searching for > example keywords, which are a commaseperated string, and > which you analysed with the commaseperatedanalyzer. > > Now, since there is no proper way to search for words in > properties indexed as type="string", I will add a new one (if > people all vote +1) that will look like: > > <s:propsearch> > <dav:prop><cms:caption/></dav:prop> > <dav:literal>house</dav:literal> > </s:propsearch> > > This one will (logically) have equivalent performance to the > d:contains operator. I do not want to change the > s:propcontains behavior because there might be legacy > applications build on its specific behavior. What I also > might want to add is an optional boost factor, to indicate > for example that keywords found in the caption are 10 times > more important then in the content, though i am not 100% sure > wether this can be added very easily > > thx for reading untill here > > Ard > > > > > > > > > > > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist
