[ 
https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14899934#comment-14899934
 ] 

Sebastian Nagel commented on NUTCH-2110:
----------------------------------------

Hi Asitang, the Injector is already able to store key-value pairs from the seed 
list in CrawlDb withing CrawlDatum's meta data, see 
[[1|http://nutch.apache.org/apidocs/apidocs-1.10/org/apache/nutch/crawl/Injector.html]].
 If the XPath statements are not too complex, this would be the easiest way: 
the protocol plugin could then read the XPath from the CrawlDatum.
Regarding the "state of a selenium operation": should the a state be passed to 
the outlinks of a page or is the same page fetched multiple times with varying 
Ajax/JavaScript actions to be performed?

> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2110
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2110
>             Project: Nutch
>          Issue Type: Sub-task
>          Components: fetcher
>    Affects Versions: 1.10
>            Reporter: Asitang Mishra
>              Labels: memex
>
> Create the capability to provide seeds in the form of "url+xpath(including 
> option to enter seach terms).selenium" to be used by selenium 
> protocols/plugins as urls/flow to reach to a specific ajax based page or save 
> the state of a selenium operation for the next fetching round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to