Hi,
My problem: integrating Giraph LinkRank implementation in Nutch 2.x
DbUpdate stage.
For Nutch 2.x,
In DbUpdateJob, DbUpdateMapper and DbUpdateReducer are applied to the
WebPage objects. Some scoring filters are activated as well.
I want to export this data as
nodes:
http://www.google.com 0.33
http://www.yahoo.com 0.33
http://www.bing.com 0.33
edges:
http://www.google.com http://www.yahoo.com
http://www.yahoo.com http://www.bing.com
http://www.bing.com http://www.google.com
and then use this data in my Giraph LinkRank implementation. I could not
figure out where and how to extract this data and let call a Giraph Job
and use the results for the next steps in Nutch 2.x.
Should I write a Gora job for exporting/importing data on Hbase and one
Giraph job for reading from Hbase and executing the code?
Best,