On Saturday 15 February 2003 09:33, you wrote: > According to Adam Brown: > > On Friday 14 February 2003 11:14, Adam Brown wrote: > > ... > > > > I am indexing this page using Htdig 3.1.6: > > > http://wire.org.au/information/violence/domestic/womens_stories/one_rur > > >al_w omans_story.html The page contains the words "woman's" and "womans" > > > but not "woman'. > > > > > > The search page is located at: http://wire.org.au/public_search.html > > > > > > When I search for "rural woman's" or "rural womans" I get no hits. > > > However when I search for "woman" the page is returned. > > > > > > My understanding is that using the default Htdig settings that > > > "woman's" gets indexed as "womans". So surely a search for 'womans' > > > should be successful. > > That's correct, assuming you're using the defaultS. The word "woman's" > should get indexed as both "womans" and "woman". If you had changed > valid_punctuation at the time you indexed, taking out the apostrophe, > then this would not work, and "woman's" would be indexed only as "woman". > You should check your db.wordlist file to make sure that both woman and > womans appear in there. > > Researching further: > > > > Results from htdig -vvvv indicate that the word "woman" is indexed, not > > "womans" > > If you are indeed running htdig version 3.1.6, then the -vvvv output should > show, when the word "woman's" is parsed, the following lines: > > word: woman's@(location) > word part: woman@(location) > > Both of these should go into db.wordlist, with the apostrophe being > stripped from the first one.
"womans" does not appear int the db.wordlist nor does it appear in the -vvvv output (see the attached rundig output). I have run this with both the default 'valid_punctuation' and my customised one. Can't work out why it would be indexing "womans" as woman, why does it trim the "s"? Regards, Adam
1:1:http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html New server: localhost, 80 Retrieval command for http://localhost/robots.txt: GET /robots.txt HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Host: localhost Header line: HTTP/1.1 404 Not Found Header line: Date: Mon, 17 Feb 2003 23:24:52 GMT Header line: Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 mod_perl/1.26 Header line: Connection: close Header line: Content-Type: text/html; charset=ISO-8859-1 Header line: returnStatus = 1 pushed pick: localhost, # servers = 1 0:0:0:http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html: Retrieval command for http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html: GET /information/violence/domestic/womens_stories/one_rural_womans_story.html HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Host: localhost Header line: HTTP/1.1 200 OK Header line: Date: Mon, 17 Feb 2003 23:24:52 GMT Header line: Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 mod_perl/1.26 Header line: Connection: close Header line: Content-Type: text/html; charset=ISO-8859-1 Header line: returnStatus = 0 Read 8060 from document Read a total of 8060 bytes Tag: <html>, matched -1 Tag: <head>, matched -1 Tag: <title>, matched 0 word: One@11 word: rural@12 word: woman@14 word: story@17 Tag: </title>, matched 1 title: One rural woman�s story Tag: <link REL="icon" HREF="/webgui-wire/www/extras/favicon.png" TYPE="image/png">, matched 26 href: http://localhost/webgui-wire/www/extras/favicon.png () resolving 'http://localhost/webgui-wire/www/extras/favicon.png' Tag: <style type="text/css">, matched 27 Tag: </style>, matched 28 Tag: <link href="/webgui-wire/www/extras/default.css" rel="stylesheet" type="text/css">, matched 26 href: http://localhost/webgui-wire/www/extras/default.css () Rejected: Extension is invalid! url rejected: (level 1)http://localhost/webgui-wire/www/extras/default.css Tag: <META NAME="Date" CONTENT="Thu, 21 Nov 2002 17:29:39 GMT">, matched 20 Tag: <meta http-equiv="Keywords" name="Keywords" content="One rural woman�s story, WIRE">, matched 20 word: One@1 word: rural@1 word: woman@1 word: story@1 word: WIRE@1 Tag: <meta http-equiv="Description" name="Description" content="">, matched 20 Tag: </head>, matched -1 Tag: <body>, matched -1 Tag: <table width="100%" height="100%" border="0" cellpadding="0" cellspacing="0">, matched -1 Tag: <tr>, matched -1 Tag: <td width="16%" nowrap align="LEFT" valign="TOP" class="leftMenu">, matched -1 Tag: <div class="wobjectArticle" id="wobjectId164">, matched -1 Tag: <a name="164">, matched 2 anchor: 164 Tag: </a>, matched 3 Tag: <h1>, matched 4 word: One@246 word: rural@247 word: woman@249 word: story@252 Tag: </h1>, matched 10 Tag: <h4>, matched 7 word: CALLER@257 Tag: </h4>, matched 13 word: husband@262 word: always@266 word: abusing@269 word: and@271 word: threatening@273 word: me.@277 word: the@280 word: books@281 word: and@284 word: accounts@285 word: home@289 word: for@291 word: his@292 word: business@294 word: well@298 word: look@301 word: after@302 word: the@304 word: house@306 word: and@308 word: two@310 word: children@312 word: aged@315 word: and@317 word: months.@320 word: husband@324 word: takes@326 word: the@328 word: car@330 word: work@332 word: and@334 word: rings@335 word: several@339 word: times@341 word: day@344 word: check@347 word: what@350 word: to.@354 word: doesn@357 word: let@359 word: have@362 word: any@364 word: money@365 word: own@369 word: not@371 word: even@373 word: housekeeping.@374 word: want@381 word: buy@384 word: something@385 word: myself@389 word: have@392 word: ask@395 word: him@396 word: for@397 word: and@400 word: often@401 word: won@404 word: give@406 word: any.@409 word: has@412 word: never@413 word: hit@415 word: though@418 word: did@422 word: break@423 word: window@426 word: once@428 word: when@430 word: was@433 word: angry.@434 word: his@438 word: dinner@439 word: not@443 word: the@445 word: table@446 word: when@449 word: comes@451 word: the@454 word: door@456 word: calls@459 word: lazy@462 word: bitch@464 word: and@466 word: gets@468 word: angry.@469 word: Then@472 word: night@475 word: demands@478 word: sex@482 word: whenever@483 word: wants@487 word: whether@491 word: feel@494 word: like@496 word: not.@500 Tag: <h4>, matched 7 word: WIRE@503 Tag: </h4>, matched 13 word: No-one@507 word part: one@507 word: deserves@509 word: live@514 word: like@515 word: this.@517 word: You@519 word: are@521 word: undertaking@522 word: great@526 word: responsibilities@528 word: caring@534 word: for@537 word: two@538 word: very@539 word: young@541 word: children@543 word: well@547 word: sharing@550 word: the@553 word: running@554 word: the@558 word: business.@560 word: Unfortunately@563 word: many@568 word: marriages@570 word: are@573 word: not@575 word: partnership@577 word: but@581 word: are@583 word: marked@584 word: threats@587 word: fears@591 word: and@593 word: putdowns.@594 word: Your@597 word: husband@599 word: sounds@602 word: very@604 word: controlling.@606 word: Feeling@611 word: trapped@613 word: home@617 word: can@619 word: very@621 word: personally@623 word: damaging@627 word: one@631 word: self@633 word: esteem@635 word: There@638 word: are@640 word: ways@641 word: through@643 word: this@646 word: situation.@648 word: You@651 word: have@653 word: taken@655 word: major@657 word: step@659 word: speaking@662 word: about@665 word: your@667 word: situation.@669 word: This@673 word: takes@675 word: great@677 word: courage.Feelings@679 word part: courage@679 word part: Feelings@679 word: fear@686 word: and@688 word: entrapment@689 word: are@693 word: strong@694 word: signs@697 word: violence@700 word: marriage.@705 word: your@709 word: case@711 word: you@713 word: have@714 word: been@716 word: exposed@718 word: range@722 word: emotional@725 word: financial@729 word: verbal@733 word: and@735 word: sexual@737 word: abuse.@739 word: Demanding@742 word: sex@745 word: against@746 word: your@749 word: will@751 word: also@754 word: illegal.@756 word: Keeping@759 word: yourself@761 word: safe@765 word: the@767 word: key.@769 word: Speak@770 word: those@774 word: friends@776 word: and@778 word: family@780 word: members@782 word: who@785 word: are@786 word: likely@788 word: listen@791 word: rather@794 word: than@796 word: offer@798 word: advice.@800 word: Specialist@803 word: services@807 word: such@810 word: The@813 word: Women's@814 word part: Women@814 word: Domestic@817 word: Violence@820 word: Crisis@823 word: Service@825 word: Melb@829 word: 1800@831 word: 015@833 word: 188@835 word: 9373@837 word: 0123@839 word: and@840 word: the@842 word: Centre@843 word: Against@846 word: Sexual@848 word: Assault@851 word: Melb@854 word: 1800@857 word: 806@859 word: 292@860 word: 9344@863 word: 2210@864 word: will@866 word: listen@868 word: and@870 word: explore@872 word: your@875 word: options.@876 word: course@880 word: for@883 word: further@885 word: support@887 word: and@890 word: information@892 word: WIRE@896 word: only@899 word: phone@901 word: call@903 word: away.@905 word: Breaking@907 word: the@910 word: isolation@911 word: and@915 word: obtaining@916 word: support@920 word: are@923 word: the@924 word: steps@925 word: overcoming@928 word: your@932 word: position.This@934 word part: position@934 word part: This@934 word: story@939 word: based@942 word: the@945 word: real@947 word: questions@948 word: women@952 word: ask@954 word: WIRE.@955 Tag: <p class="wireFooter">, matched -1 word: Last@965 word: updated@967 word: October@971 word: 2001@974 Tag: </p>, matched -1 Tag: <p>, matched -1 Tag: </div>, matched -1 Tag: </td>, matched -1 Tag: </tr>, matched -1 Tag: </table>, matched -1 Tag: </body>, matched -1 Tag: </html>, matched -1 size = 8060 pick: localhost, # servers = 1 htmerge: Sorting... htmerge: Merging... htmerge: 100:obtaining 0/http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html Preamble text: Postamble text: Note: This message will be sent again if you do not change or take away the notification of the above mentioned HTML page. Find out more about the notification service at http://www.htdig.org/meta.html Cheers! ht://Dig Notification Service

