|
I have posted a similar query on the pdftohtml list
I'm attempting to crawl portions of the web with
aspseek. Html output is working fine a is very stable. I have
configured pdftohtml as a converter. It indexes most pdf's fine, so
I don't think its a config problem, but crashes the crawl on some. when
I download the file and try it command line it works fine. I'm
currently running on the latest sources from cvs, having first tried 1.2.6
and 1.2.10. aspseek log output is as follows:
( 2 20 20 182 12 29 7 20) Adding URL: http://www.lsic.com/fin/annual01.pdf
exec /usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA >/tmp/asoXjR7TX Address of param: ba072d20 Address of param: ba07a560 all 20 threads then crash.
Just started using pdftohtml yesterday. do I
need different params |
- Re: [aseek-users] Problems with pdftohtml John Grubb
- Re: [aseek-users] Problems with pdftohtml Kir Kolyshkin
- Re: [aseek-users] Problems with pdftohtml John Grubb
- Re: [aseek-users] Problems with pdftohtml Kir Kolyshkin
- Re: [aseek-users] Problems with pdftohtml John Grubb
- [aseek-users] problem with index. doesn't want to f... Luc Santeramo
- [aseek-users] trademark character Emin Huseynov
