[jira] [Comment Edited] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

Artem Tsikiridis (JIRA) Tue, 19 Aug 2014 02:50:41 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102081#comment-14102081
 ]


Artem Tsikiridis edited comment on FLINK-838 at 8/19/14 9:49 AM:
-----------------------------------------------------------------

Hi everybody,

I have prepared some documentation for hadoop compatibility. Here: 
https://github.com/atsikiridis/incubator-flink/blob/hadoop-docs/docs/hadoop_compatability.md
 and 
https://github.com/atsikiridis/incubator-flink/blob/hadoop-docs/docs/internal_hadoop_compatibility.md

Moreover, I have pushed the {{ JobClient }}, and {{Accumulators}} / 
{{Counters}} related code here: 
https://github.com/atsikiridis/incubator-flink/tree/final-client/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapred.
 It is all based on Fabian's work for the new Reduce operator.

So now with Fabian's amazing push with the new operator we can say that we 
support all Hadoop's programming interfaces. Some still have some limitations 
(counters only obtained after job completion and some, combiners grouping of 
values not there yet). Of course more testing is always useful to fulfill the 
vision of having a full hadoop-compatibilty layer and run "everything - Hadoop" 
on Flink. And of course, there is also the mapreduce API, which also has a lot 
of users and we should support.

Since the deadline for Google Summer of Code has just passed, I'd like to thank 
Fabian, Robert and everybody else for helping me to get actively involved in 
Flink in the last couple of months :) The community has been very vibrant and 
it was really exciting. At some times, it wasn't very easy but you guys were 
always there :) It has been really nice and I feel that I have learned a lot, 
both technically and how it feels to be in an open-source community that is 
rapidly growing :) .


Of course, the deadline doesn't change much as I will still be around because 
it would be a great pleasure to keep contributing to Flink and of course trying 
to fulfill the hadoopcompatibility vision :). I'd say the GSoC was a nice 
kickstart.

Thank you all.


was (Author: atsikiridis):
Hi everybody,

I have prepared some documentation for hadoop compatibility. Here: 
https://github.com/atsikiridis/incubator-flink/blob/hadoop-docs/docs/hadoop_compatability.md
 and 
https://github.com/atsikiridis/incubator-flink/blob/hadoop-docs/docs/internal_hadoop_compatibility.md

Moreover, I have pushed the {{ JobClient }}, and {{Accumulators}} / 
{{Counters}} related code here: 
https://github.com/atsikiridis/incubator-flink/tree/final-client/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapred.
 It is all based on Fabian's work for the new Reduce operator.

So now with Fabian's amazing push with the new operator we can say that we 
support all Hadoop's programming interfaces. Some still have some limitations 
(counters only obtained after job completion and some, combiners grouping of 
values not there yet). Of course more testing is always useful to fulfill the 
vision of having a full hadoop-compatibilty layer and run "everything - Hadoop" 
on Flink. And of course, there is also the {{ mapreduce }} API, which also has 
a lot of users and we should support.

Since the deadline for Google Summer of Code has just passed, I'd like to thank 
Fabian, Robert and everybody else for helping me to get actively involved in 
Flink in the last couple of months :) The community has been very vibrant and 
it was really exciting. At some times, it wasn't very easy but you guys were 
always there :) It has been really nice and I feel that I have learned a lot, 
both technically and how it feels to be in an open-source community that is 
rapidly growing :) .


Of course, the deadline doesn't change much as I will still be around because 
it would be a great pleasure to keep contributing to Flink and of course trying 
to fulfill the hadoopcompatibility vision :). I'd say the GSoC was a nice 
kickstart.

Thank you all.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for 
> Stratosphere
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a 
> full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: 
> https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: 
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's 
> Configuration to the one of Stratosphere. By successfully bridging the Hadoop 
> tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This 
> can be determined by running some popular Hadoop examples on Stratosphere 
> (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line 
> interface) for the wrapper. Implement how will the user run them. (1 - 2 
> weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, 
> Partitioners, Distributed Cache etc.) There are quite a few interfaces and it 
> will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more 
> unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

Reply via email to