[ 
https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1703:
----------------------------------------

    Attachment: NUTCH-1703v4.patch

Patch for 2.x HEAD.
Patch v3 does not compile.
In this updated patch, there are test failures

Testcase: testGetOutlinks took 0.103 sec
        FAILED
got wrong number of outlinks (expecting 2, got 4)
answer: 
toUrl: http://www.nutch.org/abc anchor: alt1
toUrl: http://www.nutch.org/def anchor: alt2

got: 
toUrl: http://www.nutch.org/abc anchor: alt1
toUrl: http://www.nutch.org/xx.jpg anchor: alt1
toUrl: http://www.nutch.org/def anchor: alt2
toUrl: http://www.nutch.org/yy.jpg anchor: alt2

> Nutch ignores alt text of images
> --------------------------------
>
>                 Key: NUTCH-1703
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1703
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.2.1
>            Reporter: Canan Girgin
>             Fix For: 2.3, 1.8
>
>         Attachments: NUTCH-1703v4.patch, NUTCH_1703.patch, NUTCH_1703_v3.patch
>
>
> If you put image as link alt text of that image is equivalent to the anchor 
> text of text link. During content parse nutch does not give image alt text 
> and  anchor text for that link is empty.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to