According to Jamie Anstice: > Seeing as people are talking about this, I thought I'd relate my hack at > displaying & searching on specific metadata fields. We had a requirement > to restrict queries to pages containing a specific meta-tag element or > elements, and also to display meta-tag content in the output. > > First I added a meta-data List to DocumentRef, with methods to return the > list, and add strings to the list (DocumentRef is what's stored in the > db.docdb for each document). Then I added a got_metatag method to > Retriever to store a meta-tag content in the DocumentRef. Then in the > HTML parser I modified do_tag to call Reteirver::got_metatag when it sees > an appropriate metatag (I made a config attribute meta_tag_store which can > take a StringList of metatag names to store, or 'all', and only matching > meta-tag NAMEs are stored - I only store the meta-tag content attribute, > indexed by the name attribute. This might be a bit non-robust, but it's > good enough for 2 hrs of hacking). This gives us the ability to store > arbitrary meta-data. > > Then I modified htsearch/Display::displayMatch to read another config > attribute meta_tag_display, which is a list of tag names to make into > variables suitable for inclusion in output templates (again 'all' is an > option too). For each meta-tag stored in the retrieved DocumentRef, if it > matches a name in the list of tags to display, it prefixes the name with > mt_ and puts the new name and the content in the vars structure to make it > available for the page writer. > > This means that we can surface meta-data, but we still can't search on it. > I decided that introducing new terms into the main index was the easiest > way - it's not flash, but it does us for now. As part of the HTML parser > do_tag, I read another config attribute meta_tag_index which has a list of > all the tag names which will be indexed. When a matching tag comes up, I > make up a keyword mt_<tag-name>_<tag-content> and add that to the index > for the current document (I use the existing word breaking code to break > up multi-word tag contents, so a tag <meta name="Platform" content="Linux, > Solaris, Irix"> would turn into three words mt_platform_linux, > mt_platform_solaris, mt_platform_irix - they're all forced to lower case). > Then I just use the keywords= CGI parameter to htsearch to include the > keywords I want to restrict - we've got a php advanced search page with a > bunch of list selects on it, and a redirect page which flattens the > multiple options into a single keywords entry (sometime I'd like to modify > the keywords parameter handling to allow it to take boolean queries, but > that can wait for a bit). I needed to play with the punctuation > characters to allow _ in words, and Bob's your uncle. > > I hope this is of use of interest to someone - I've implemented this on > our 3.2.x based tree (and I won't post a patch because our tree has > diverged too far - soon I'll have to make it based on the snapshots > again), but something similar should work on the 3.1.x too.
This all sounds quite interesting. Most of it is somewhat similar to what I'd envisioned implementing. The main difference is that rather than adding prefixes to the words indexed in meta tags, I envisioned indexing the words as-is, but with new flag values to distinguish them in the word index, and have control over their scoring factors. In any case, I don't know how old your 3.2.x code was that you used as a starting point, but there have been a LOT of bug fixes since 3.2.0b3, and there will be undoubtedly more, so it would probably be wise to resync your code to the current CVS, and then it should be easier to provide the patches for the changes you describe. If we get these into CVS, that would be one less difference for you to maintain in your code. I don't see us putting these changes into 3.1.6, but they'd be a very valuable addition to 3.2! -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
