Title: Parsing search returns
Hi everyone,
First off, this is a great product. *All* of my troubles have turned out to be related to our new load-balancing architecture, not htdig, and when I've worked out all the kinks I'll post a workaround summary for others.
A couple questions I couldn't find the answers to in the archives:
1. Is there a way to strip out elements from the search returns? In our case, each <title> tag in the site includes the site name. So headers from search returns kook like this:
The Onion | Damn You, Hearst!
The Onion | I Miss My Old Sled
The Onion | Drop Dead, Every Last One of You!
Pretty silly, right? I'd like to parse that repeating element out, preferable without employing an auxiliary script.
2. I see that there are configuration attributes for translate_latin1, translate_amp, and translate_lt_gt: false
I thought translate_latin1: false might work, but I'm still getting the entity —
printing out on the page instead of the em-dash in search results. Is there a config attribute I'm missing?
Thanks!
--
Adam Powell
Web Programmer, The Onion
America's Finest News Source
[EMAIL PROTECTED] | voice: 608.256.1372 | fax: 608.256.2535
www.theonion.com
- Re: [htdig] Parsing search returns Adam Powell
- Re: [htdig] Parsing search returns Jim Cole
- Re: [htdig] Parsing search returns Geoff Hutchison
- Re: [htdig] Parsing search returns Adam Powell
- Re: [htdig] Parsing search returns Jim Cole

