Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "GoogleSummerOfCode/SecurityLayer" page has been changed by kamaci:
https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer

New page:
<<TableOfContents(4)>>

||'''Title 
:'''||||[[https://summerofcode.withgoogle.com/projects/#6099177868099584|GSOC 
2016 Proposal]]||
||'''Issue :'''|||| 
[[https://issues.apache.org/jira/browse/NUTCH-1756|NUTCH-1756 - Security layer 
for NutchServer]]||
||'''Student :'''||||Furkan KAMACI - furkankamaci [at] gmail.com||
||'''Mentor :'''||||[[https://wiki.apache.org/nutch/LewisJohnMcgibbney|Lewis 
John McGibbney]]||

=== Abstract ===

Apache Nutch is a highly extensible and scalable open source web crawler 
software project. Stemming from Apache
Lucene, the project has diversified and now comprises two codebases, namely:
Nutch 1.x and Nutch 2.x.
This proposal aims to develop a security layer for Nutch 2.x.

=== Additional Info ===

http://www.furkankamaci.com/

Furkan KAMACI

Istanbul Technical University

Graduate School of Science, Engineering and Technology

Computer Engineering

Istanbul, Turkey

=== 1. Introduction ===

Apache Nutch comprises two codebases, namely:

Nutch 1.x: A well matured, production ready crawler. 1.x enables fine grained 
configuration, relying on Apache Hadoop data
structures, which are great for batch processing.

Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but 
which differs in one key area; storage is abstracted
away from any specific underlying data store by using Apache Gora for handling 
object to persistent mappings. This means we
can implement an extremely flexible model/stack for storing everything (fetch 
time, status, content, parsed text, outlinks, inlinks,
etc.) into a number of NoSQL storage solutions.

=== 2. Definition of the Problem ===

Nutch 2.x has a REST API and web application but it doesn't have a security 
layer on it. A security layer should be
implemented which covers security functionality (authentication, 
authorization), different authentication mechanisms ,
documentation and refactoring existing code. This project will therefore 
propose the design, development and implementation of
the security agenda as described above. This work will be specifically 
applicable to the Nutch 2.X codebase.

=== 3. Proposed Method ===

==== A. Background ====

There has been implemented an API which lets to interact with Nutch via REST 
API. Administration and configuration tasks
can be done via this API.

This proposal offers a comprehensive security layer under NUTCH­1756. Existing 
code should be re­factored, security layer
should be added and a documentation should be done programmatically (i.e. 
Miredot, Swagger).

==== B. Suggested Steps for Proposed Method ====

1) Authentication/Authorization Implementation

2) API Documentation Generator Implementation

=== 4. Schedule & Timeline ===

Suggested schedule and timeline is as follows:

1) ''Analyzing The Problem (1 Week ­ 30 May 2016)''
  a) Problem will be analyzed with more detail.

2) ''Authentication Implementation (5 weeks ­ 4 July 2016)''
  a) HTTP Basic authentication
  
  b) HTTP Digest authentication
  
  c) SSL client authentication
  
  d) Kerberos Authentication

3) ''Authorization Implementation (3 weeks ­ 25 July 2016)''
  a) Authorization will be implemented

4) ''API Documentation (1 week ­ 1 August 2016)''
  a) API Documentation implementation

5) ''Test (1 week ­ 8 August 2016)''
  a) Implementation tests will be written and run.

6) ''Documentation (1 week ­ 15 August 2016)''
  a) Documentation will be prepared.

=== References ===

[1] https://issues.apache.org/jira/browse/NUTCH­1756

[2] https://wiki.apache.org/nutch/NutchRESTAPI

[3] https://issues.apache.org/jira/browse/GORA­386

[4] https://issues.apache.org/jira/browse/NUTCH­2243

[5] https://issues.apache.org/jira/browse/NUTCH­2022

[6] https://github.com/apache/nutch

[7] https://en.wikipedia.org/wiki/Apache_Nutch

[8] https://github.com/apache/gora

[9] http://en.wikipedia.org/wiki/Apache_Gora

[10] http://gora.apache.org/current/tutorial.html#introduction


=== Reports ===

[[https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/WeeklyReports|Weekly
 Reports]]

[[https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/MidtermReport|Midterm
 Report]]

[[https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/FinalReport|Final
 Report]]

Reply via email to