Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "GoogleSummerOfCode/GraphGeneratorTool" page has been changed by OmkarReddy:
https://wiki.apache.org/nutch/GoogleSummerOfCode/GraphGeneratorTool?action=diff&rev1=2&rev2=3

  <<TableOfContents>>
  
- ||'''Title :'''|||| GSoC 2016 Proposal ||
+ ||'''Title :'''|||| GSoC 2017 Proposal ||
  ||'''Issue :'''|||| 
[[https://issues.apache.org/jira/browse/NUTCH-2369|NUTCH-2369 - Graph Generator 
Tool for Nutch]]||
  ||'''Student :'''||||Omkar Reddy - omkarr [at] apache dot org||
  ||'''Mentor :'''||||Lewis John McGibbney||
  
  === Abstract ===
  
- Currently Apache Nutch[0] has the concept of a WebGraph[1] which that builds 
Web graphs, performs a stable convergent link-analysis, and updates the crawldb 
with those scores. The main purpose of building a new Graph Generator tool for 
Nutch is to create a substantiated ‘deep’ graph enabling true traversal, this 
could be a game changer for how Nutch Crawl data is interpreted. This will 
involve storage of  the crawl data as RDF datasets in the form of serialized 
n-quad statements. This graph can be used to execute queries on the webpages. 
Graph generation will be achieved using the Apache Tinkerpop[2] 
ScriptInputFormat  and ScriptOutputFormat’s[3] respectively. There are 
basically two scenarios to represent the graph as RDF datasets that we discuss 
in this proposal below.
+ Currently Apache Nutch[0] has the concept of a WebGraph[1] that builds Web 
graphs, performs a stable convergent link-analysis, and updates the crawldb 
with those scores. The main purpose of building a new Graph Generator tool for 
Nutch is to create a substantiated ‘deep’ graph enabling true traversal, this 
could be a game changer for how Nutch Crawl data is interpreted. This will 
involve storage of  the crawl data as RDF datasets in the form of serialized 
n-quad statements. This graph can be used to execute queries on the webpages. 
Graph generation will be achieved using the Apache Tinkerpop[2] 
ScriptInputFormat  and ScriptOutputFormat’s[3] respectively. There are 
basically two scenarios to represent the graph as RDF datasets that we discuss 
in this proposal below.
  
  === Background ===
  

Reply via email to