[ 
https://issues.apache.org/jira/browse/NUTCH-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210241#comment-15210241
 ] 

Lewis John McGibbney commented on NUTCH-2005:
---------------------------------------------

[[email protected]], please check out the issue description which I've 
updated. You should begin producing your proposal ASAP.
You can see previous proposals for guidance at 
https://wiki.apache.org/nutch/GoogleSummerOfCode#A2015 for guidance. If you 
have any issues then please let me know.

> Implement HTrace'ing in Nutch
> -----------------------------
>
>                 Key: NUTCH-2005
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2005
>             Project: Nutch
>          Issue Type: New Feature
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: gsoc2016
>             Fix For: 2.4
>
>
> Recent developments within the tracing community have brought projects like 
> Apache HTrace (Incubating) into the Apache Incubator opening up the 
> possibility of utilizing tracing logic to better understand distributed 
> applications, systems and systems-of-systems. As many will know, tracing 
> involves a specialized use of logging to record information about a program’s 
> execution. Although many use cases involve the use of tracing within 
> distributed systems such as Hadoop and databases, few tracing experiments 
> belong within the field of large scale, distributed Web search. 
> This issue will combine comprehensive tracing mechanisms in Apache HTrace 
> (Incubating) with the scalable, flexible crawling architecture presented by 
> Apache Nutch 2.X.
> As essentially every job (Inject, Generate, Fetch Parse, UpdateDB, etc.) in 
> Nutch 2.X interacts with a stack of complex underlying components (known as 
> the search stack) comprehensive tracing would provide insight into system 
> performance, latency, etc. 
> Every job (a class which extends NutchTool and implements Tool) within Nutch 
> 2.X therefore needs to be analyzed for suitability and appropriateness for 
> tracing. Once this is understood a ranked list of tools should be produced, 
> the ranking will be based upon which tools are most suited to tracing... I 
> would suggest that FetcherJob be the top as it enables us to trace not only 
> the HTTPSocketConnections but also writing of data through Gora --> 
> DataStore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to