Thanks Roy, I will look into Swish-e.
Edward On Wed, Mar 16, 2011 at 11:32 AM, Roy Tennant <roytenn...@gmail.com> wrote: > These requirements fit Swish-e [1] to a "T". I've used it to index > millions of XML records [2], and there are no particular requirements > for the XML -- it just needs to be well-formed. You can have it > automatically detect and index XML fields as well as index all words > across all fields. This is all handled by a very simple text config > file. The only downside is you will need to write the user interface > (CGI) in your favorite language to interact with Swish-e. > > For example, here is my entire config file for Current Cites [3], > where I store citations in my own XML format: > > DefaultContents XML* > UndefinedMetaTags auto > IndexDir /home/tennantr/public_html/currentcites/cites/ > ReplaceRules remove /home/tennantr/public_html/currentcites/cites/ > PropertyNames creator title description booktitle source > IndexOnly .xml > > This tells Swish-e to expect XML, the line "UndefinedMetaTags auto" > tells it to keep track of any XML tag it sees, the next two lines > telll it where the files are and I remove the path from the index so I > only get returned each file title without the server path included. > The "PropertyNames" line defines with elements are actually stored in > the index, which I can then retrieve directly in the search results > for display to the user. The "IndexOnly .xml" line tells Swish-e to > ignore anything without that filename extension. Nothing could be > easier. > Roy > > [1] http://swish-e.org/ > [2] http://roytennant.com/proto/hathi/ > [3] http://lists.webjunction.org/currentcites/ > > On Wed, Mar 16, 2011 at 8:00 AM, Edward M. Corrado <ecorr...@ecorrado.us> > wrote: >> Hi, >> >> I [will soon] have a small set (< 1000 records) of Dublin Core >> metadata published in OAI_DC format that I want to be searchable via a >> Web browser. Normally we would use Ex Libris's Primo for this, but >> this particular set of data may have some confidential information and >> our repository only has minimal built in search functions. While we >> still may go with Primo for these records, I am looking for at other >> possibilities. The requirements as I see them are: >> >> 1) Can ingest records in OAI_DC format >> 2) Allow remote end-users who are familiar with the collection search >> these ingest records via a Web browser. >> 3)Search should be keyword anywhere or individual fields although it >> does not need to have every whizzbang feature out there. In other >> words, basic search feature are fine. >> 4) Should support the ability to link to the display copy in our >> repository (probably goes without saying) >> 5) Should be simple to install and maintain (Thus, at least in my >> mind, eliminating something like Blacklight) >> 6) Preferably a LAMP application although a Windows server based >> solution is a possibility as well >> 7) Preferably Open Source, or at least no- or low-cost >> >> I haven't been able to find anything searching the Web, but it seems >> like something people may have done before. Before I re-invent the >> wheel or shoe-horn something together, does anyone have any >> suggestions? >> >> Edward >> >