Hi all,

As you can see in the subject of this message the matter has been fixed!
Yesterday I run some tests based on some import of you guys, and also based
on some of my own conclusions. One of the tests was that I kept the
'downloaded' pdf files in the temp folder for investigation rather then
deleting them (which is the normal procedure). I've discovered that the pdf
file was corrupted and could only be read partially. 

Then I remembered that in one of the howto's I've read on the Internet a
while ago, when I was trying to compile htdig 3.1.6 on Windows, someone
stated that Windows had to write files with the binary flag on. I concluded
that this was the reason why the pdf file got corrupted, and that htdig
didn't used the binary flag in may case.

Now, normally when you compile htdig on Windows you'd probably use a c/c++
compiler for Windows, but I didn't; I used Cygwin. Thus, the cygwin
enviroment thinks it has to compile for Linux/Unix (and thus no binary) but
you do end up with dos/windows executables. 

I've searched the source and made some changes which I think might be of use
for someone. Remember, I've compiled htdig version 3.2.0b6 while using
Cygwin! I don't know what happens if you'd use MSVC or an other windows
based c/c++ compiler. Try before you alter ;) Don't blame me :P

After unpacking the zip/tarball to your computer, navigate to the htdig
subfolder. You will find a file called ExternalParser.cc

Open this file and locate this piece of code (starts at line 186):

// *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP*
// ===================================================================== 

    //
    // Write the contents to a temporary file.
    //
    String      path = getenv("TMPDIR");
    int         fd;
    if (path.length() == 0)
      path = "/tmp";
 #ifndef HAVE_MKSTEMP
    path << "/htdext." << getpid(); // This is unfortunately predictable

#ifdef O_BINARY
    fd = open((char*)path, O_WRONLY|O_CREAT|O_EXCL|O_BINARY);
#else
    fd = open((char*)path, O_WRONLY|O_CREAT|O_EXCL|O_BINARY);
#endif
#else
    path << "/htdex.XXXXXX";
    fd = mkstemp((char*)path);
    // can we force binary mode somehow under Cygwin, if it has mkstemp?
 #endif
    if (fd < 0)
    {
      if (debug)

// *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP*
// =====================================================================


Then change it into:

// *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP*
// =====================================================================
    //
    // Write the contents to a temporary file.
    //
    String      path = getenv("TMPDIR");
    int         fd;
    if (path.length() == 0)
      path = "/tmp";
// #ifndef HAVE_MKSTEMP
    path << "/htdext." << getpid(); // This is unfortunately predictable

#ifdef O_BINARY
    fd = open((char*)path, O_WRONLY|O_CREAT|O_EXCL|O_BINARY);
#else
    fd = open((char*)path, O_WRONLY|O_CREAT|O_EXCL|O_BINARY);
#endif
// #else
//    path << "/htdex.XXXXXX";
//    fd = mkstemp((char*)path);
    // can we force binary mode somehow under Cygwin, if it has mkstemp?
// #endif
    if (fd < 0)
    {
      if (debug)

// *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP* *SNIP*
// =====================================================================

FYI: within Cygwin you can use the command mkstemp, but it will create a
temporary file for you which is not created with the binary flag. Therefor
we want to comment it out all together. We then end up with a decision to
use binary or not, which I forced to be binary. (I could also have deleted
or commented the whole #ifdef O_BINARY stuff and just used the first
statement (with O_BINARY) but I'm to lazy to delete it all :P) 

Then recompile (or compile if you're just starting) and voila. Htdig should
be working fine on Windows.

I've tested it on the file which gave me problems. I can now open the
temporary file without problems, or error messages, and the file can be
found using any of the words in the pdf file (which wasn't possible). I'm
now running the whole search again, but I'm not expecting any problems any
more.  

I like to say thank you to all the people who have given me input and
suggestions!

See you!

Marco




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to