[jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

Artem Tsikiridis (JIRA) Tue, 01 Jul 2014 23:31:33 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049664#comment-14049664
 ]


Artem Tsikiridis commented on FLINK-838:
----------------------------------------

Hello,

here is a report for what has happened the last two weeks.

Shortly:

Prepared a branch with stable parts of the code which is currently in code 
review. The tests passed in a 2 machine cluster I have setup. Since that was my 
first time running something on a Flink cluster, maybe you can verify.

Handling of InputFormats can be improved after 
https://github.com/apache/incubator-flink/pull/52.

More cleaning has been done along with improved handling of exceptions in 
OutputCollector to simulate hadoop's messages with concrete types.

Moreover, I've been working on the sorting. I am confident that by the end of 
the week I can have a Secondary Sorting example in Hadoop running. Maybe I can 
submit it in a separate PR when it's cleaner and with tests (again, end of the 
current week)?

Then, we have six weeks left and the features left (correct me if I'm wrong) in 
accordance to the initial project plan are the DistributedCache (I have not 
done anything yet) and all the advanced options of a JobConf (monitoring, 
counters) (I have a locally running example with Counters - Accumulators but 
it's not very clean). Actually, may I suggest some restructure of the code? For 
example how about splitting submitJobAndWait to submit and wait (protected 
methods)? This is the nature of the changes I would like to suggest. I think I 
will separate the changes in a separate branch and ping you here in the next 
days.

Please, feel  free to ask me any questions.


> GSoC Summer Project: Implement full Hadoop Compatibility Layer for 
> Stratosphere
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a 
> full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: 
> https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: 
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's 
> Configuration to the one of Stratosphere. By successfully bridging the Hadoop 
> tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This 
> can be determined by running some popular Hadoop examples on Stratosphere 
> (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line 
> interface) for the wrapper. Implement how will the user run them. (1 - 2 
> weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, 
> Partitioners, Distributed Cache etc.) There are quite a few interfaces and it 
> will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more 
> unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

Reply via email to