[
https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032168#comment-14032168
]
Artem Tsikiridis edited comment on FLINK-838 at 6/16/14 7:08 AM:
-----------------------------------------------------------------
Hello,
here is a report of the fourth week .
short
Worked more on the runtime environment of a hadoop job. (see point 2). Added
support for custom partitioning and intermediate sorting (comparator,
groupcomparator).
Prepared an environment for distributed testing.
1)
we are reaching the midterm evaluation of the program in 2 weeks time. As
Robert suggested above it would be nice to merge the first version
of the abstraction layer. That would be the support for the following hadoop
mapred interfaces: Mapper, Reducer, Combiner, A basic driver
(just parsing the conf and starting a job), and the comparator-partitioner
interfaces which I worked on this week.
I am currently trying to improve test coverage for this branch and will try it
on the cluster today. So in a few days (mid of the week)
it will be virtually be ready to be code-reviewed to be merged. More, I would
be happy to assist with testing 777 if iit is
needed.
2)
Then, as soon as 1 is being code-reviewed I will in parallel work on the
advanced features of a Hadoop driver where I have some issues mostly because I
need to access information from Flink's Nephele Cluster which is abstracted
away to have a working RunningJob for Hadoop's JobClient. You see, I repeat
myself a lot in the environment code. I was wondering if it is possible to
refactor the environments (e.g. break submitJobandWait to submitJob and wait -
generally have a wait ). This is the nature of the changes. However, I believe
this discussion can be done after the midterm where a first version of the
project is already merged.
was (Author: atsikiridis):
Hello,
here is a report of the fourth week .
short
Worked more on the runtime environment of a hadoop job. (see point 2). Added
support for custom partitioning and intermediate sorting (comparator,
groupcomparator).
Prepared an environment for distributed testing.
1)
we are reaching the midterm evaluation of the program in 2 weeks time. As
Robert suggested above it would be nice to merge the first version
of the abstraction layer. That would be the support for the following hadoop
mapred interfaces: Mapper, Reducer, Combiner, A basic driver
(justing parsing the conf and starting a job), and the comparator-partitioner
interfaces which I worked on this week.
I am currently trying to improve test coverage for this branch and will try it
on the cluster today. So in a few days (mid of the week)
it will be virtually be ready to be code-reviewed to be merged. More, I would
be happy to assist with testing 777 if iit is
needed.
2)
Then, as soon as 1 is being code-reviewed I will in parallel work on the
advanced features of a Hadoop driver where I have some issues mostly because I
need to access information from Flink's Nephele Cluster which is abstracted
away to have a working RunningJob for Hadoop's JobClient. You see, I repeat
myself a lot in the environment code. I was wondering if it is possible to
refactor the environments (e.g. break submitJobandWait to submitJob and wait -
generally have a wait ). This is the nature of the changes. However, I believe
this discussion can be done after the midterm where a first version of the
project is already merged.
> GSoC Summer Project: Implement full Hadoop Compatibility Layer for
> Stratosphere
> -------------------------------------------------------------------------------
>
> Key: FLINK-838
> URL: https://issues.apache.org/jira/browse/FLINK-838
> Project: Flink
> Issue Type: Improvement
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a
> full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki:
> https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal:
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's
> Configuration to the one of Stratosphere. By successfully bridging the Hadoop
> tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This
> can be determined by running some popular Hadoop examples on Stratosphere
> (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line
> interface) for the wrapper. Implement how will the user run them. (1 - 2
> weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators,
> Partitioners, Distributed Cache etc.) There are quite a few interfaces and it
> will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more
> unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature,
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)