[Nutch Wiki] Trivial Update of "ThirdReport" by LewisJohnMcgibbney

Apache Wiki Wed, 30 Jul 2014 18:22:29 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "ThirdReport" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/ThirdReport?action=diff&rev1=7&rev2=8

- ## page was copied from SecondReport
- = Google Summer of Code 2014 Report 2 =
+ = Google Summer of Code 2014 Report 3 =
  '''Project Name''': NUTCH-841 Create a Wicket-based Web Application for Nutch 
2.X
  
- '''Report date''': 11th July 2014
+ '''Report date''': 30th July 2014
  
  '''Student Name''': Fjodor Vershinin
  
@@ -16, +15 @@

  Main goal of this project is to create an Apache Wicket-based Web GUI for 
Apache Nutch 2.X.
  
  == Review of Previous Actions ==
+  * Change entire build structure to Ant + Ivy as per existing 2.x codebase 
-  * --(If possible create a graphic of the REST API as it exists in his 
proposed patch for 
[[https://issues.apache.org/jira/browse/NUTCH-1769|NUTCH-1769 API refactoring]] 
this should only include the information included in his above commentary on 
the topic.)--
-  * --(Provide links to the '''HTML Prototype''', I have not seen any of this 
code and therefore cannot assert that progress has been made as described 
above.)--
-  * --(Provide links/patches for the '''application skeleton''' as stated 
above... I have yet to see any code.)--
  
  == Objectives ==  
+ '''To be completed by student'''
+ 
-    * Add ability to get logs by REST API
-    * --(Implement generic crawl cycle in GUI)--
-    * Add ability upload seed files (or post seed data) by REST API
  === Contributions to Nutch community ===
+ '''To be completed by student'''
- At previous week (13.07-19.07) I worked on the most challenging task, namely 
I'd tried to implement crawling cycle in GUI part. The most problematic was 
tasks status controlling, but I'd solved this issue with simple polling. Other 
option is to post whole batch of jobs to Nutch Server, and shift all the 
responsibility to server's side. You can see my pullrequest on 
[[https://bitbucket.org/feodorv/uinutch/pull-request/2/implemented-crawling-script|
 bitbucket]]. Also, I would propose minor changes in API and created issue with 
a little patch about generate component. 
[[https://issues.apache.org/jira/browse/NUTCH-1819|NUTCH-1819]]
- 19.07-27.07
- I created page, which allows create and run remote crawls. Main issue was 
concerning asynchronous execution and displaying progress. I'd implemented this 
by using spring's @Async annotations and spring's executor. Progress reporting 
is made by polling mechanism, which can be replaced by wicket-atmosphere in 
future. Then html5 websockets can be used instead of polling. 
- Also, some refactorings has been done and fixed bug in test execution process.
- 
[[https://bitbucket.org/feodorv/uinutch/pull-request/4/implemented-crawls-page| 
pull request]]
- 
- The next task is seed upload, then we can run our application on Apache's VM. 
Concerning seed upload, I would propose not to upload files, but add ability to 
create seed lists on UI side, which can be uploaded by API, and nutch server 
will create seed file.
- This option can make management of seeding much easier. The second question 
is about data store. Now UI app should store too much info in plaintext 
properties file. I would propose to take embedded H2 java database, then data 
management wouldnt be an issue.
  
  
  == Future Actions ==
+ '''To be completed by student'''
-  * Change entire build structure to Ant + Ivy as per existing 2.x codebase
- '''Remainder to be filled out by student'''
   
     
  == Mentors Comments ==
- Fjodor's code has come on well since last reporting. We have been working to 
get a VM established on Apache Infrastructure with limited success, however I 
am going to host Fjodor's work at http://any23-vm.apache.org as an itermediate 
step to achieving the goals and aim of this GSoC project.
+ '''To be completed by mentor'''
  
- We have not been communicating very much which (whilst may suit Fjodor) does 
not necessarily suit myself. This being said, I do however very much appreciate 
the direction in which he is taking this project and of course the initiative 
he is showing by keeping the project moving at a reasonable pace.
+ Signed:
  
- Our next step will be to get the VM established, we can then patch up a local 
copy of Nutch 2.x, keeping it in Sync with the 2.X branch. I have a vision that 
we will establish an HBase server on the VM as well which we will simply 
truncate the WebPage tabel for on an hourly basis in an effort to prevent the 
VM for maxing out storage space. I will get this set up and provisioned once 
Fjodor is ready for his code to be displayed on a public level.
- 
- Overall, another good reporting period, however we still have a lot of work 
to do, including engaging in a close review of the codebase, based on user 
feedback from within the Nutch community. 
- 
- Signed: Lewis John McGibbney (lewismc)
-

[Nutch Wiki] Trivial Update of "ThirdReport" by LewisJohnMcgibbney

Reply via email to