While tracing the whereabouts of some "spuriously deleted"
documents, I came to debug the Postscript::parse() function.
 It's just that there's not much to trace -- it immediately
returns (line 56).
 Looking in this years mailing lists contents, it seems that
people think that ht://Dig can actually parse PostScript, and
someone posted a problem description about not getting any
output while indexing PostScript documents.  Small wonder... 

This "disabling" of PostScript parsing predates CVS logs.

Now, if I enable it by removing the "return", everything seems
to work as expected, but debugging output appears; there are
"naked" cout writes (not testing the "debug" flag).
 Work "as expected" I say, because all words in PostScript files
are not complete or easily parseable words; often one or two
characters are expressed in ways that the PostScript parser
cannot grok, so a chopped or otherwise munged word is indexed.
See for example <URL:http://egcs.cygnus.com/scheduler.ps>.

This leads me to think that the PostScript parser is not as
complete as needed, and possibly "disabled" for a good reason.
Maybe it should be rewritten, using PDF.cc, or maybe the PDF
parser has the same problems.

I don't know.  Maybe someone has some good answers?

Sidenote:
 If your local_urls documents are stored with a time before era
(1 Jan 1970), they may (linux) have a date older than nothing
(negative date if your time_t is signed), and will not be
indexed.   See Document.cc around line 550 (date is zero for
newly encountered documents).
 Not that this urgently needs fixing at this level; maybe a debug
output saying "Whoops!  You have some really old documents here"
is in order (I may fix).
 Hope all systems get a 64-bit time_t -- or at least unsigned --
before 2038...

brgds, H-P
-- 
Hans-Peter Nilsson, Axis Communications AB, S - 223 70 LUND, SWEDEN
[EMAIL PROTECTED] | Tel +46 462701867,2701800
Fax +46 46136130 | RFC 1855 compliance implemented; report loss of brain.
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to