[jira] [Commented] (NUTCH-1870) Generic xsl parser plugin

ASF GitHub Bot (Jira) Tue, 14 Jul 2020 08:39:19 -0700


    [ 
https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157466#comment-17157466
 ]


ASF GitHub Bot commented on NUTCH-1870:
---------------------------------------

balashashanka commented on pull request #439:
URL: https://github.com/apache/nutch/pull/439#issuecomment-658252419


   Hi @sebastian-nagel, was going through this. Out of curiosity why hasn't 
this still merged?
    I see in the discussions everyone is ok with the code. And it doesnt have 
any merge conflicts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Generic xsl parser plugin
> -------------------------
>
>                 Key: NUTCH-1870
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1870
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer, parser
>    Affects Versions: 1.9
>            Reporter: Albinscode
>            Priority: Major
>         Attachments: NUTCH-1870-trunk-v3.patch, NUTCH-1870-trunk-v4.patch, 
> nutch-site.xml, xsl-parse-plugin.patch, xsl-parse-plugin2.patch
>
>
> The aim of this plugin is to use XSLT to extract metadata from HTML DOM 
> structures.
> | Your Data | --> | Parse-html plugin  or TIKA plugin | --> | DOM structure | 
> --> |XSLT plugin |
>                   
>                   
> The main advantage is that:
> - You won't have to produce any java code, only XSLT and configuration
> - It can process DOM structure from DocumentFragment (@see NekoHtml and @see 
> TagSoup)
> - It is HtmlParseFilter plugin compatible and can be plugged as any other 
> plugin (parse-js, parse-swf, etc...)
> This topic has been discussed on 
> http://www.mail-archive.com/dev%40nutch.apache.org/msg15257.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NUTCH-1870) Generic xsl parser plugin

Reply via email to