[
https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383121#comment-15383121
]
Markus Jelsma edited comment on NUTCH-1414 at 7/18/16 9:35 PM:
---------------------------------------------------------------
Java provides proper PCRE compatible regular expressions by default. Using
online tools like regexplanet.com will help you to quickly verify your regexes.
Make sure you have some unit tests to verify your plugin. Again, see examples
of referenced and other Nutch plugins.
was (Author: markus17):
Java provides proper PCRE compatible regular expressions by default. Using
online tools like regexplanet.com will help you to quickly verify your regexes.
Make sure you have some unit tests to verify your plugin.
> Date extraction parse filter
> ----------------------------
>
> Key: NUTCH-1414
> URL: https://issues.apache.org/jira/browse/NUTCH-1414
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Markus Jelsma
> Attachments: NUTCH-1414-1.6-1-testdata.patch, NUTCH-1414-1.6-1.patch
>
>
> Date extraction parse filter for Nutch to provide means to extract an
> arbitrary page date (article date) from the parse text.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)