> your patch solves the problem.
Got another segmentation fault (after digging 40,000 docs) - this occured
during parsing a Word doc
http://www.tu-chemnitz.de/wirtschaft/bwl2/download/portrait.doc
via external parser parse_word_doc.pl.
I've no idea if this portrait.doc is ok, but our robust digger shouldn't
die by M$ docs... (I knew it, parsing word docs must be dangerous... :-)
(BTW, contrib/htparsedoc/parse_word_doc.pl has errors - wrong line breaks)
(gdb) bt
#0 0x1b550 in Retriever::got_word (this=0xeffff6d8,
word=0x10b8c9a
"$J��.\231\204���9>�\213�:\031�N\0162\2264\005��\204vv\006\0318���2hkw",
location=0, heading=272) at Retriever.cc:876
#1 0x1ee10 in ExternalParser::parse (this=0x435100,
retriever=@0xeffff6d8,
base=@0xca8d68) at ExternalParser.cc:168
#2 0x1a6e0 in Retriever::RetrievedDocument (this=0xeffff6d8,
doc=@0x1eaaf0,
ref=0x83de50) at Retriever.cc:556
#3 0x1a2ac in Retriever::parse_url (this=0xeffff6d8, urlRef=@0x44b788)
at Retriever.cc:458
#4 0x19cf0 in Retriever::Start (this=0xeffff6d8) at Retriever.cc:288
#5 0x1e188 in main (ac=9, av=0xeffff8ec) at main.cc:245
- Frank
--
Email: [EMAIL PROTECTED] http://www.tu-chemnitz.de/~fri/
Work: Computing Services, Chemnitz University of Technology, Germany
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.