Just taking a stab in the dark: -- set up a "copy field" in Solr. This basically takes the content from an existing field and creates a mirror of it. -- apply some extra string processing to your copy field so that it splits and tokenizes the content on the "-" (e.g., "enemy of islam" and "haverford" become two tokens on the field) -- ??? -- Profit.
Seriously, though, I'm not sure what you would do after you've tokenized it. You could set up some sort of faceted browse interface to show co-occuring terms, or something else. Maybe some other Solr folks out there have some better ideas. -Andrew On 2012-07-11, at 11:32 AM, Laurie Allen wrote: > Hi, > I'm working on a drupal site with a very complicated taxonomy. > Backstory: A polisci professor and team of students designed this > project first as a theoretcal exercise as part of a senior thesis > double major in political science and computer science, and then as > the project of a very devoted and smart student using drupal. It's > both amazingly cool and technically complex. At this point, we are > trying to help rein it in to the library servers and help support it > so that new crops of students can maintain it without needing to be CS > majors, and also to help them address a few issues and problems that > have been discovered over the past year or so. My colleague and I are > totally new to Drupal, and to this database. While he's working on the > solr indexing, I'm trying to help figure out the taxonomy issue. > > See here: > http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera > Basically, the site indexes the public statements of al-qaeda. Each > statements is assigned a bunch of terms by students who have studied > jihad and al-qaeda. > > Each term is composed of two parts. > First part: a keyword from a controlled list of keywords - there are > many of these and they include places, people, theories, and other > things. So, "Afghanistan", "Barack Obama", and "media" are all > keywords. > Second part: a context from a much smaller (around 20) collection of > contexts, including I guess how the keyword figures in this statement. > Example include "area of jihad, enemy of islam, religious relations" > and others. > > So, the full term would be "media - enemy of islam" for example. And > each record includes a large number of these. > > Going forward, we'd ideally like to allow users of the site to find > all three of the following: > 1. Records that contain a particular two part term. (easy - that's > what taxonomy is for) > 2. A list of terms that begin with the first part so that they can > select the modifier for it (also easy, if we make the second term a > subterm or child of the first, this will work fine) > 3. A list of terms that have the second part as a qualifier. So, for > example, show me all terms in which anything is called an "enemy of > islam" and then let me choose which keyword is referred to as an enemy > of jihad and show me that record. > > It's that third one that we can't figure out. The only way we can > think to accomplish this is to basically duplicate each entry so that > we'd say "Haverford - enemy of islam" and "enemy of islam - Haverford" > I think that will work, but since there are many statements, and each > statement has many terms, this solution doesn't seem ideal. Do any of > you have ideas? > Thanks very much. > Laurie > -- > Coordinator for Digital Scholarship and Services > Haverford College Library > 370 Lancaster Ave > Haverford, PA 19041 > 610-896-4226 > lal...@haverford.edu