Hello everybody:

This is my first post in this list so, please be patient.
Sorry for my bad english too, i'm not a fluent english speaker...

I've recently compiled and installed successfully ht://Dig 3.2.0b5 under
Cygwin to make some experiments. I've configured with './configure
--prefix=c:/htdig', then make and make install without any problem.

I am interested in what I think is a new feature: the ability to index
local filesystems using the 'file:' protocol, so I've configured htdig to
do such indexing.

Configuration (only changed this line):
start_url: file:///cygdrive/c/mydocuments/htdigtest/

I've been unable to make htdig index this folder using other path namings
like 'file:///mydocuments/htdigtest/' or
'file:///c:/mydocuments/htdigtest/', I don't know why.

Next I run htdig with:
c:\htdig\bin> htdig -i -vvvv -s > file.log

I receive in file.log the next output:
ht://dig Start Time: Thu Jan  8 17:00:29 2004
        1:1:file:///cygdrive/c/mydocuments/htdigtest/
New server: localhost, 0
 - Persistent connections: enabled
 - HEAD before GET: enabled
 - Timeout: 30
 - Connection space: 0
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
 - Accept-Language:
 pushed
pick: localhost, # servers = 1
> localhost supports HTTP persistent connections (infinite)
0:2:0:file:///cygdrive/c/mydocuments/htdigtest/: Making 'file' request on
file:///cygdrive/c/mydocuments/htdigtest/
Tag: <html>, matched -1
Tag: <head>, matched -1
Tag: <meta name="robots" content="noindex">, matched 20

META ROBOT: Noindex file:///cygdrive/c/mydocuments/htdigtest/
Tag: <link href="file:///cygdrive/c/mydocuments/htdigtest/comment.html">,
matched 26
href: file:///cygdrive/c/mydocuments/htdigtest/comment.html ()
resolving 'file:///cygdrive/c/mydocuments/htdigtest/comment.html'

   pushing file:///cygdrive/c/mydocuments/htdigtest/comment.html
+Tag: </head>, matched -1
Tag: <body>, matched -1
Tag: </body>, matched -1
Tag: </html>, matched -1
 ( file:///cygdrive/c/mydocuments/htdigtest/ ignored) size = 144
1:3:1:file:///cygdrive/c/mydocuments/htdigtest/comment.html: Making 'file'
request on file:///cygdrive/c/mydocuments/htdigtest/comment.html
MIME types: c:/htdig/conf/mime.types
MIME: ez        -> application/andrew-inset
MIME: hqx       -> application/mac-binhex40
MIME: cpt       -> application/mac-compactpro
MIME: doc       -> application/msword
MIME: bin       -> application/octet-stream
MIME: dms       -> application/octet-stream
MIME: lha       -> application/octet-stream
MIME: lzh       -> application/octet-stream
MIME: exe       -> application/octet-stream
MIME: class     -> application/octet-stream
MIME: oda       -> application/oda
MIME: pdf       -> application/pdf
MIME: ai        -> application/postscript
MIME: eps       -> application/postscript
MIME: ps        -> application/postscript
MIME: rtf       -> application/rtf
MIME: smi       -> application/smil
MIME: smil      -> application/smil
MIME: mif       -> application/vnd.mif
MIME: ppt       -> application/vnd.ms-powerpoint
MIME: bcpio     -> application/x-bcpio
MIME: vcd       -> application/x-cdlink
MIME: pgn       -> application/x-chess-pgn
MIME: cpio      -> application/x-cpio
MIME: csh       -> application/x-csh
MIME: dcr       -> application/x-director
MIME: dir       -> application/x-director
MIME: dxr       -> application/x-director
MIME: dvi       -> application/x-dvi
MIME: spl       -> application/x-futuresplash
MIME: gtar      -> application/x-gtar
MIME: hdf       -> application/x-hdf
MIME: js        -> application/x-javascript
MIME: skp       -> application/x-koan
MIME: skd       -> application/x-koan
MIME: skt       -> application/x-koan
MIME: skm       -> application/x-koan
MIME: latex     -> application/x-latex
MIME: nc        -> application/x-netcdf
MIME: cdf       -> application/x-netcdf
MIME: sh        -> application/x-sh
MIME: shar      -> application/x-shar
MIME: swf       -> application/x-shockwave-flash
MIME: sit       -> application/x-stuffit
MIME: sv4cpio   -> application/x-sv4cpio
MIME: sv4crc    -> application/x-sv4crc
MIME: tar       -> application/x-tar
MIME: tcl       -> application/x-tcl
MIME: tex       -> application/x-tex
MIME: texinfo   -> application/x-texinfo
MIME: texi      -> application/x-texinfo
MIME: t -> application/x-troff
MIME: tr        -> application/x-troff
MIME: roff      -> application/x-troff
MIME: man       -> application/x-troff-man
MIME: me        -> application/x-troff-me
MIME: ms        -> application/x-troff-ms
MIME: ustar     -> application/x-ustar
MIME: src       -> application/x-wais-source
MIME: zip       -> application/zip
MIME: au        -> audio/basic
MIME: snd       -> audio/basic
MIME: mid       -> audio/midi
MIME: midi      -> audio/midi
MIME: kar       -> audio/midi
MIME: mpga      -> audio/mpeg
MIME: mp2       -> audio/mpeg
MIME: mp3       -> audio/mpeg
MIME: aif       -> audio/x-aiff
MIME: aiff      -> audio/x-aiff
MIME: aifc      -> audio/x-aiff
MIME: ram       -> audio/x-pn-realaudio
MIME: rm        -> audio/x-pn-realaudio
MIME: rpm       -> audio/x-pn-realaudio-plugin
MIME: ra        -> audio/x-realaudio
MIME: wav       -> audio/x-wav
MIME: pdb       -> chemical/x-pdb
MIME: xyz       -> chemical/x-pdb
MIME: gif       -> image/gif
MIME: ief       -> image/ief
MIME: jpeg      -> image/jpeg
MIME: jpg       -> image/jpeg
MIME: jpe       -> image/jpeg
MIME: png       -> image/png
MIME: tiff      -> image/tiff
MIME: tif       -> image/tiff
MIME: ras       -> image/x-cmu-raster
MIME: pnm       -> image/x-portable-anymap
MIME: pbm       -> image/x-portable-bitmap
MIME: pgm       -> image/x-portable-graymap
MIME: ppm       -> image/x-portable-pixmap
MIME: rgb       -> image/x-rgb
MIME: xbm       -> image/x-xbitmap
MIME: xpm       -> image/x-xpixmap
MIME: xwd       -> image/x-xwindowdump
MIME: igs       -> model/iges
MIME: iges      -> model/iges
MIME: msh       -> model/mesh
MIME: mesh      -> model/mesh
MIME: silo      -> model/mesh
MIME: wrl       -> model/vrml
MIME: vrml      -> model/vrml
MIME: css       -> text/css
MIME: asc       -> text/plain
MIME: txt       -> text/plain
MIME: rtx       -> text/richtext
MIME: rtf       -> text/rtf
MIME: sgml      -> text/sgml
MIME: sgm       -> text/sgml
MIME: tsv       -> text/tab-separated-values
MIME: etx       -> text/x-setext
MIME: xml       -> text/xml
MIME: mpeg      -> video/mpeg
MIME: mpg       -> video/mpeg
MIME: mpe       -> video/mpeg
MIME: qt        -> video/quicktime
MIME: mov       -> video/quicktime
MIME: avi       -> video/x-msvideo
MIME: movie     -> video/x-sgi-movie
MIME: ice       -> x-conference/x-cooltalk
MIME: html      -> text/html
MIME: htm       -> text/html
Read a total of 569 bytes
Tag: <html>, matched -1
Tag: <head>, matched -1
Tag: <title>, matched 0
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
Tag: </title>, matched 1

title: Core JavaScript Reference 1.5: 4 Comments
Tag: </head>, matched -1
Tag: <body>, matched -1
Tag: <a name="1066594" id="1066594">, matched 2
anchor: 1066594
Tag: </a>, matched 3
Tag: <span class="sansserif">, matched -1
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
Tag: </span>, matched -1
Tag: <p>, matched -1
Tag: <br>, matched -1
Tag: <br>, matched -1
Tag: <a name="1066617" id="1066617">, matched 2
anchor: 1066617
Tag: </a>, matched 3
Tag: <a name="comment" id="comment">, matched 2
anchor: comment
Tag: </a>, matched 3
Tag: <span class="sansserif">, matched -1
word: [EMAIL PROTECTED]
Tag: </span>, matched -1
Tag: </p>, matched -1
Tag: </body>, matched -1
Tag: </html>, matched -1
 size = 569
pick: localhost, # servers = 1
> localhost supports HTTP persistent connections (infinite)
htdig: Run complete
htdig: 1 server seen:
htdig:     localhost:0 2 documents

HTTP statistics
===============
 Persistent connections    : Yes
 HEAD call before GET      : Yes
 Connections opened        : 0
 Connections closed        : 0
 Changes of server         : 0
 HTTP Requests             : 0
 HTTP KBytes requested     : 0
 HTTP Average request time : 0 secs
 HTTP Average speed        : 0 KBytes/secs

ht://dig End Time: Thu Jan  8 17:00:29 2004

The file comment.html is small. It contains the next lines:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd";>
<html>
  <head>
    <title>
      Core JavaScript Reference 1.5: 4 Comments
    </title>
  </head>
  <body>
      <a name="1066594" id="1066594"></a> <span class="sansserif">This
chapter describes the syntax for comments, which can appear anywhere
between tokens.</span>
      <p>
        <br>
        <br>
         <a name="1066617" id="1066617"></a> <a name="comment"
id="comment"></a> <span class="sansserif">comment
        </span>
      </p>
  </body>
</html>

Then, my database shows something wrong, I thing:
08/01/2004  15:35       <DIR>          .
08/01/2004  15:35       <DIR>          ..
08/01/2004  17:00               24.576 db.docs.index
08/01/2004  17:00               24.576 db.excerpts
08/01/2004  17:00               24.576 db.docdb
08/01/2004  17:00                    0 db.words.db
               4 archivos         73.728 bytes

As you can see, 'db.words.db' is empty!

In fact, running 'htsearch' with
C:\htdig\cgi-bin>htsearch words=javascript > file.log

dumps this:
WordDB: c:/htdig/var/htdig/db.words.db: unexpected file type or format
WordDB: DB->cursor: method meaningless before open

Output in file.log is 'nomatch.html' contents.

I've made some tests in my Linux Box and it worked OK.

Please, help would be very appreciated.

Thanks in advance,
Roberto.


-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to