> your patch solves the problem.

Got another segmentation fault (after digging 40,000 docs) - this occured
during parsing a Word doc
http://www.tu-chemnitz.de/wirtschaft/bwl2/download/portrait.doc
via external parser parse_word_doc.pl.

I've no idea if this portrait.doc is ok, but our robust digger shouldn't
die by M$ docs... (I knew it, parsing word docs must be dangerous... :-)

(BTW, contrib/htparsedoc/parse_word_doc.pl has errors - wrong line breaks)

(gdb) bt
#0  0x1b550 in Retriever::got_word (this=0xeffff6d8, 
    word=0x10b8c9a
"$J��.\231\204���9>�\213�:\031�N\0162\2264\005��\204vv\006\0318���2hkw",
location=0, heading=272) at Retriever.cc:876
#1  0x1ee10 in ExternalParser::parse (this=0x435100,
retriever=@0xeffff6d8, 
    base=@0xca8d68) at ExternalParser.cc:168
#2  0x1a6e0 in Retriever::RetrievedDocument (this=0xeffff6d8,
doc=@0x1eaaf0, 
    ref=0x83de50) at Retriever.cc:556
#3  0x1a2ac in Retriever::parse_url (this=0xeffff6d8, urlRef=@0x44b788)
    at Retriever.cc:458
#4  0x19cf0 in Retriever::Start (this=0xeffff6d8) at Retriever.cc:288
#5  0x1e188 in main (ac=9, av=0xeffff8ec) at main.cc:245

- Frank
-- 
Email: [EMAIL PROTECTED]  http://www.tu-chemnitz.de/~fri/
Work:  Computing Services,  Chemnitz University of Technology,  Germany


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to