[
https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14899934#comment-14899934
]
Sebastian Nagel commented on NUTCH-2110:
----------------------------------------
Hi Asitang, the Injector is already able to store key-value pairs from the seed
list in CrawlDb withing CrawlDatum's meta data, see
[[1|http://nutch.apache.org/apidocs/apidocs-1.10/org/apache/nutch/crawl/Injector.html]].
If the XPath statements are not too complex, this would be the easiest way:
the protocol plugin could then read the XPath from the CrawlDatum.
Regarding the "state of a selenium operation": should the a state be passed to
the outlinks of a page or is the same page fetched multiple times with varying
Ajax/JavaScript actions to be performed?
> Create the capability to provide seeds in the form of "url+xpath(including
> option to enter seach terms).selenium"
> ------------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-2110
> URL: https://issues.apache.org/jira/browse/NUTCH-2110
> Project: Nutch
> Issue Type: Sub-task
> Components: fetcher
> Affects Versions: 1.10
> Reporter: Asitang Mishra
> Labels: memex
>
> Create the capability to provide seeds in the form of "url+xpath(including
> option to enter seach terms).selenium" to be used by selenium
> protocols/plugins as urls/flow to reach to a specific ajax based page or save
> the state of a selenium operation for the next fetching round.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)