[
https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061429#comment-14061429
]
Artem Tsikiridis commented on FLINK-838:
----------------------------------------
Hello,
this week:
1) I got feedback for my current PR and it's almost merged! Thanks guys.
https://github.com/apache/incubator-flink/pull/37
2) Sorting is finished. I ran several tests and a full secondary sort example
on Flink (this one:
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/,
modified for {{mapred}} ) Which is cool, because it proves that the current
interfaces are working nicely together.
3) I also ran a couple of example using mapred's {{DistributedCache}}.
Basically, we should just consider it an abstraction on top of {{JobConf}}. The
real problem is that we should {{configure}} the {{Configurable}} interfaces (
{{Mapper}} and {{Reducer}} during deserialization so that the conf reaches
every task on every node! So, yeah, I achieved it with a small refactoring of
my in review code by serializing / desearializing the whole conf. I'll propose
it in the following days, as it is in a separate branch.
If a user requests local files there is no problem. Then, if the user requests
files from HDFS it works transparently providing the underlying FS is actually
HDFS. I should test what happens when it isn't, tomorrow.
So now we have all programming interfaces. I will return to the mapping of the
{{JobClient}} and the underlying {{JobConf}}. Which means a lot of interesting
stuff like counters etc. I have done some stuff and actually mapped counters
accumulators but it was kinda hacky and I had to get down to the Nephele level.
I'll see if I can improve that code.
Moreover, i will be working in parallel in finding the best way to run a script
without a single line of code. It's a bit tricky, as we have to replace the
user's {{JobClient}} implementation with our own. I have tried some approaches.
I'll let you know soon.
Feel free to ask me anything :)
Thanks!
> GSoC Summer Project: Implement full Hadoop Compatibility Layer for
> Stratosphere
> -------------------------------------------------------------------------------
>
> Key: FLINK-838
> URL: https://issues.apache.org/jira/browse/FLINK-838
> Project: Flink
> Issue Type: Improvement
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a
> full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki:
> https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal:
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's
> Configuration to the one of Stratosphere. By successfully bridging the Hadoop
> tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This
> can be determined by running some popular Hadoop examples on Stratosphere
> (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line
> interface) for the wrapper. Implement how will the user run them. (1 - 2
> weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators,
> Partitioners, Distributed Cache etc.) There are quite a few interfaces and it
> will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more
> unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature,
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)