So I am building an htDig solution for the OpenACS.... The OpenACS is
typical I suspect of many web applications. It contains content you want
to index, and lots of links to content you don't.
If you examine http://openacs.org/wp/ you can get an idea of what I
mean. /wp/ is Wimpypoint, a netbased "powerpoint" equivalent. Clearly, I
want to index the presentations found under /wp/, but I don't want to walk
down all of the navigational links: my presentations, last two week
presentations, last month's presentations, etc. I just want to go down the
show "all of everyone's" link.
1. One strategy is to use htDig in it's typical mode, and use things like
exclude patterns to keep htDig on track. One problem with that that is
OpenACS specific (but maybe general to similar web apps) is that the dates
are wrong. The webserver returns the current date and time of these db
generated presentations, and doesn't return the date and time the
presentation was created.
2. Another strategy would be to create some sort of application specific
database specific indexing tool. Presumably this would scour the db
tools that some application is using and create some sort of htDig friendly
results file. Has anyone done anything like this? How might this work?
Finally, I wonder if I could combine strategies 1 and 2 by writing a
converter or external parser. I imagine what I could do is create a config
file unique to each web app, and what that config file might do is specify
a web app specific converter to convert from text/html to well,
text/html. Internally the converter would presumably get a page from the
server and somehow rewrite it so that htDig would get just the right pages,
and would get the right dates for those pages. Could this strategy work?
What have folks done to index dynamic content on db driven pages?
Thanks,
Jerry
P.S. If you care to see the search interface I have working, you can take a
look at it at http://www.theashergroup.com/ You can see how I am
experimenting with the OpenACS.ORG indexer by visiting
http://www.theashergroup.com/demos/openacs. This is mainly implemented
with one tcl script and a rewritten wrapper.html. I took wrapper.html and
rewrote it as an AOLserver ADP page (like an ASP page). I exec to
htsearch. htsearch runs the search using wrapper.adp to create the
output. I take the output from htsearch and have aolserver interpret it as
an adp page. The results are returned to the user.
=====================================================
Jerry Asher [EMAIL PROTECTED]
1678 Shattuck Avenue Suite 161 Tel: (510) 549-2980
Berkeley, CA 94709 Fax: (877) 311-8688
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html