Hello and thank you for this review. You noticed things about thread safety
that I was not aware of and  that's true: for a matter of optimization (not
creating each time a new transformer) I didn't see this thread safety issue.
Your review is really accurate and I especially appreciate the list of
links you provided to illustrate your remarks.
I'll update the code to follow your recommandations and wait for the next
points 😂.
Thanks again,
Albin
Le 9 oct. 2014 21:52, "Sebastian Nagel (JIRA)" <[email protected]> a écrit :

>
>     [
> https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165632#comment-14165632
> ]
>
> Sebastian Nagel commented on NUTCH-1870:
> ----------------------------------------
>
> Thanks, [~Albinscode], for the patch. Looks nice, code is well formatted,
> ... I'll continue testing, but a few first comments:
> * could load resources in setConf(conf) and not do it on-demand in the
> "filter" method:
> ** setConf() is called early, so failures in reading configuration
> resources are reported soon
> ** filter() may be called concurrently because for every plugin only one
> instance is hold per extension point
> * thread-safety: the filter() method must be thread-safe, and so must be
> all used object instances. [Transformer|
> http://docs.oracle.com/javase/7/docs/api/javax/xml/transform/Transformer.html]
> instances are not safe and may not be shared by threads. That's also true
> for other DOM/XML related classes, cf. [1|
> http://mail-archives.apache.org/mod_mbox/nutch-user/201301.mbox/%[email protected]%3E],
> [2|
> http://mail-archives.apache.org/mod_mbox/nutch-user/201301.mbox/%[email protected]%3E],
> or NUTCH-1596. Possible solutions are, e.g., to make this variables local
> or [thread local|
> http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html].
>
> > there are some unit tests strongly related to sites I'm crawling
> * would be better to take sample pages where we are sure not to violate
> any copyright
>
> > Generic xsl parser plugin
> > -------------------------
> >
> >                 Key: NUTCH-1870
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-1870
> >             Project: Nutch
> >          Issue Type: New Feature
> >          Components: indexer, parser
> >    Affects Versions: 1.9
> >            Reporter: Albinscode
> >             Fix For: 1.10
> >
> >         Attachments: xsl-parse-plugin.patch
> >
> >
> > The aim of this plugin is to use XSLT to extract metadata from HTML DOM
> structures.
> > | Your Data | --> | Parse-html plugin  or TIKA plugin | --> | DOM
> structure | --> |XSLT plugin |
> >
> >
> > The main advantage is that:
> > - You won't have to produce any java code, only XSLT and configuration
> > - It can process DOM structure from DocumentFragment (@see NekoHtml and
> @see TagSoup)
> > - It is HtmlParseFilter plugin compatible and can be plugged as any
> other plugin (parse-js, parse-swf, etc...)
> > This topic has been discussed on
> http://www.mail-archive.com/dev%40nutch.apache.org/msg15257.html
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to