[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547778#comment-14547778 ]
Asitang Mishra commented on NUTCH-2011: --------------------------------------- Hi [~wastl-nagel], -The answer to your first two questions is Yes, your interpretations are correct. -Third question: The FetchNodeDb info will be used to make a D3 graph, that will in real time give information of which page is being fetched, and if fetched properly, what outlinks it generated. We need to output this as a visualization before the data is being written into the segments. -I agree that we don't need an extra persistent layer as all the data is already stored segment wise which is same as "round wise", me and [~chrismattmann] had discussed it before. - Although a buffer queue is an appealing idea, but we are not using it because we wanted to make things more RESTful (so the user/graph can request pages from any to any index from the temporary store/NodeDb or all the data from any previously updated specific segment). Also, in case of a failure if the program requests the nodes again and the buffer queue does not have it, then we will have to wait for the round to end and read it from the segment. But, we can delve into [~wastl-nagel] 's idea if I guess some strict or cautionary measures are taken at the client side :) . What do you think [~chrismattmann] and [~sujenshah]. > Endpoint to support realtime JSON output from the fetcher > --------------------------------------------------------- > > Key: NUTCH-2011 > URL: https://issues.apache.org/jira/browse/NUTCH-2011 > Project: Nutch > Issue Type: Sub-task > Components: fetcher, REST_api > Reporter: Sujen Shah > Assignee: Chris A. Mattmann > Labels: memex > Fix For: 1.11 > > > This fix will create an endpoint to query the Nutch REST service and get a > real-time JSON response of the current/past Fetched URLs. > This endpoint also includes pagination of the output to reduce data transfer > bw in large crawls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)