On Saturday 15 February 2003 09:33, you wrote:
> According to Adam Brown:
> > On Friday 14 February 2003 11:14, Adam Brown wrote:
>
> ...
>
> > > I am indexing this page using Htdig 3.1.6:
> > > http://wire.org.au/information/violence/domestic/womens_stories/one_rur
> > >al_w omans_story.html The page contains the words "woman's" and "womans"
> > > but not "woman'.
> > >
> > > The search page is located at: http://wire.org.au/public_search.html
> > >
> > > When I search for "rural woman's" or "rural womans" I get no hits.
> > > However when I search for "woman" the page is returned.
> > >
> > > My understanding is that using the default Htdig settings that
> > > "woman's" gets indexed as "womans". So surely a search for 'womans'
> > > should be successful.
>
> That's correct, assuming you're using the defaultS.  The word "woman's"
> should get indexed as both "womans" and "woman".  If you had changed
> valid_punctuation at the time you indexed, taking out the apostrophe,
> then this would not work, and "woman's" would be indexed only as "woman".
> You should check your db.wordlist file to make sure that both woman and
> womans appear in there.
> > Researching further:
> >
> > Results from htdig -vvvv indicate that the word "woman" is indexed, not
> > "womans"
>
> If you are indeed running htdig version 3.1.6, then the -vvvv output should
> show, when the word "woman's" is parsed, the following lines:
>
> word: woman's@(location)
> word part: woman@(location)
>
> Both of these should go into db.wordlist, with the apostrophe being
> stripped from the first one.


"womans" does not appear int the db.wordlist nor does it appear in the -vvvv 
output (see the attached rundig output). I have run this with both the 
default 'valid_punctuation' and my customised one.

Can't work out why it would be indexing "womans" as woman, why does it trim 
the "s"?

Regards,

Adam


        
1:1:http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html
New server: localhost, 80
Retrieval command for http://localhost/robots.txt: GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Host: localhost

Header line: HTTP/1.1 404 Not Found
Header line: Date: Mon, 17 Feb 2003 23:24:52 GMT
Header line: Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 mod_perl/1.26
Header line: Connection: close
Header line: Content-Type: text/html; charset=ISO-8859-1
Header line: 
returnStatus = 1
 pushed
pick: localhost, # servers = 1
0:0:0:http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html:
 Retrieval command for 
http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html:
 GET /information/violence/domestic/womens_stories/one_rural_womans_story.html 
HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Host: localhost

Header line: HTTP/1.1 200 OK
Header line: Date: Mon, 17 Feb 2003 23:24:52 GMT
Header line: Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 mod_perl/1.26
Header line: Connection: close
Header line: Content-Type: text/html; charset=ISO-8859-1
Header line: 
returnStatus = 0
Read 8060 from document
Read a total of 8060 bytes
Tag: <html>, matched -1
Tag: <head>, matched -1
Tag: <title>, matched 0
word: One@11
word: rural@12
word: woman@14
word: story@17
Tag: </title>, matched 1

title: One rural woman�s story
Tag: <link REL="icon" HREF="/webgui-wire/www/extras/favicon.png" TYPE="image/png">, 
matched 26
href: http://localhost/webgui-wire/www/extras/favicon.png ()
resolving 'http://localhost/webgui-wire/www/extras/favicon.png'
Tag: <style type="text/css">, matched 27
Tag: </style>, matched 28
Tag: <link href="/webgui-wire/www/extras/default.css" rel="stylesheet" 
type="text/css">, matched 26
href: http://localhost/webgui-wire/www/extras/default.css ()

   Rejected: Extension is invalid!
url rejected: (level 1)http://localhost/webgui-wire/www/extras/default.css
Tag: <META NAME="Date" CONTENT="Thu, 21 Nov 2002 17:29:39 GMT">, matched 20
Tag: <meta http-equiv="Keywords" name="Keywords" content="One rural woman�s story, 
WIRE">, matched 20
word: One@1
word: rural@1
word: woman@1
word: story@1
word: WIRE@1
Tag: <meta http-equiv="Description" name="Description" content="">, matched 20
Tag: </head>, matched -1
Tag: <body>, matched -1
Tag: <table width="100%"  height="100%" border="0" cellpadding="0" cellspacing="0">, 
matched -1
Tag: <tr>, matched -1
Tag: <td width="16%" nowrap align="LEFT" valign="TOP" class="leftMenu">, matched -1
Tag: <div class="wobjectArticle" id="wobjectId164">, matched -1
Tag: <a name="164">, matched 2
anchor: 164
Tag: </a>, matched 3
Tag: <h1>, matched 4
word: One@246
word: rural@247
word: woman@249
word: story@252
Tag: </h1>, matched 10
Tag: <h4>, matched 7
word: CALLER@257
Tag: </h4>, matched 13
word: husband@262
word: always@266
word: abusing@269
word: and@271
word: threatening@273
word: me.@277
word: the@280
word: books@281
word: and@284
word: accounts@285
word: home@289
word: for@291
word: his@292
word: business@294
word: well@298
word: look@301
word: after@302
word: the@304
word: house@306
word: and@308
word: two@310
word: children@312
word: aged@315
word: and@317
word: months.@320
word: husband@324
word: takes@326
word: the@328
word: car@330
word: work@332
word: and@334
word: rings@335
word: several@339
word: times@341
word: day@344
word: check@347
word: what@350
word: to.@354
word: doesn@357
word: let@359
word: have@362
word: any@364
word: money@365
word: own@369
word: not@371
word: even@373
word: housekeeping.@374
word: want@381
word: buy@384
word: something@385
word: myself@389
word: have@392
word: ask@395
word: him@396
word: for@397
word: and@400
word: often@401
word: won@404
word: give@406
word: any.@409
word: has@412
word: never@413
word: hit@415
word: though@418
word: did@422
word: break@423
word: window@426
word: once@428
word: when@430
word: was@433
word: angry.@434
word: his@438
word: dinner@439
word: not@443
word: the@445
word: table@446
word: when@449
word: comes@451
word: the@454
word: door@456
word: calls@459
word: lazy@462
word: bitch@464
word: and@466
word: gets@468
word: angry.@469
word: Then@472
word: night@475
word: demands@478
word: sex@482
word: whenever@483
word: wants@487
word: whether@491
word: feel@494
word: like@496
word: not.@500
Tag: <h4>, matched 7
word: WIRE@503
Tag: </h4>, matched 13
word: No-one@507
word part: one@507
word: deserves@509
word: live@514
word: like@515
word: this.@517
word: You@519
word: are@521
word: undertaking@522
word: great@526
word: responsibilities@528
word: caring@534
word: for@537
word: two@538
word: very@539
word: young@541
word: children@543
word: well@547
word: sharing@550
word: the@553
word: running@554
word: the@558
word: business.@560
word: Unfortunately@563
word: many@568
word: marriages@570
word: are@573
word: not@575
word: partnership@577
word: but@581
word: are@583
word: marked@584
word: threats@587
word: fears@591
word: and@593
word: putdowns.@594
word: Your@597
word: husband@599
word: sounds@602
word: very@604
word: controlling.@606
word: Feeling@611
word: trapped@613
word: home@617
word: can@619
word: very@621
word: personally@623
word: damaging@627
word: one@631
word: self@633
word: esteem@635
word: There@638
word: are@640
word: ways@641
word: through@643
word: this@646
word: situation.@648
word: You@651
word: have@653
word: taken@655
word: major@657
word: step@659
word: speaking@662
word: about@665
word: your@667
word: situation.@669
word: This@673
word: takes@675
word: great@677
word: courage.Feelings@679
word part: courage@679
word part: Feelings@679
word: fear@686
word: and@688
word: entrapment@689
word: are@693
word: strong@694
word: signs@697
word: violence@700
word: marriage.@705
word: your@709
word: case@711
word: you@713
word: have@714
word: been@716
word: exposed@718
word: range@722
word: emotional@725
word: financial@729
word: verbal@733
word: and@735
word: sexual@737
word: abuse.@739
word: Demanding@742
word: sex@745
word: against@746
word: your@749
word: will@751
word: also@754
word: illegal.@756
word: Keeping@759
word: yourself@761
word: safe@765
word: the@767
word: key.@769
word: Speak@770
word: those@774
word: friends@776
word: and@778
word: family@780
word: members@782
word: who@785
word: are@786
word: likely@788
word: listen@791
word: rather@794
word: than@796
word: offer@798
word: advice.@800
word: Specialist@803
word: services@807
word: such@810
word: The@813
word: Women's@814
word part: Women@814
word: Domestic@817
word: Violence@820
word: Crisis@823
word: Service@825
word: Melb@829
word: 1800@831
word: 015@833
word: 188@835
word: 9373@837
word: 0123@839
word: and@840
word: the@842
word: Centre@843
word: Against@846
word: Sexual@848
word: Assault@851
word: Melb@854
word: 1800@857
word: 806@859
word: 292@860
word: 9344@863
word: 2210@864
word: will@866
word: listen@868
word: and@870
word: explore@872
word: your@875
word: options.@876
word: course@880
word: for@883
word: further@885
word: support@887
word: and@890
word: information@892
word: WIRE@896
word: only@899
word: phone@901
word: call@903
word: away.@905
word: Breaking@907
word: the@910
word: isolation@911
word: and@915
word: obtaining@916
word: support@920
word: are@923
word: the@924
word: steps@925
word: overcoming@928
word: your@932
word: position.This@934
word part: position@934
word part: This@934
word: story@939
word: based@942
word: the@945
word: real@947
word: questions@948
word: women@952
word: ask@954
word: WIRE.@955
Tag: <p class="wireFooter">, matched -1
word: Last@965
word: updated@967
word: October@971
word: 2001@974
Tag: </p>, matched -1
Tag: <p>, matched -1
Tag: </div>, matched -1
Tag: </td>, matched -1
Tag: </tr>, matched -1
Tag: </table>, matched -1
Tag: </body>, matched -1
Tag: </html>, matched -1
 size = 8060
pick: localhost, # servers = 1
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:obtaining              

0/http://localhost/information/violence/domestic/womens_stories/one_rural_womans_story.html

Preamble text:


Postamble text:
Note: This message will be sent again if you do not change or
take away the notification of the above mentioned HTML page.

Find out more about the notification service at

    http://www.htdig.org/meta.html

Cheers!

ht://Dig Notification Service


Reply via email to