[ 
https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064690#comment-14064690
 ] 

Artem Tsikiridis edited comment on FLINK-838 at 7/17/14 8:05 AM:
-----------------------------------------------------------------

Hi.

Actually yes Fabian, the result is the same in Hadoop, you are right.

The issue is that {{tasktracker.maximum.reduce.tasks}} is not really the number 
of tasks that should be run in parallel, but of course as it's name implies, 
the maximum. The number of reducers hadoop really runs is different and is 
defined per job. We should still use the tasktracker's maximum conf variables 
as an upper bound, I guess but then what I did wasn't correct...

{{getNumReduceTasks}} and protentially {{getNumTasksToExecutePerJvm}} is what 
we need. I'll let you know when I have it correctly.

Thanks.


was (Author: atsikiridis):
Hi.

Actually yes Fabian, the result is the same in Hadoop, you are right.

The issue is that {{tasktracker.maximum.reduce.tasks}} is not really the number 
of tasks that should be run in parallel, but of course as it's name implies, 
the maximum. The number of reducers hadoop really runs is different and is 
defined per job. We should still use the tasktracker's maximum conf variables 
as an upper bound, I guess but then what I did wasn't correct...

{{getNumReduceTasks}} and protentially {{getNumTasksToExecutePerJvm}}is what we 
need. I'll let you know when I have it correctly.

Thanks.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for 
> Stratosphere
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a 
> full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: 
> https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: 
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's 
> Configuration to the one of Stratosphere. By successfully bridging the Hadoop 
> tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This 
> can be determined by running some popular Hadoop examples on Stratosphere 
> (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line 
> interface) for the wrapper. Implement how will the user run them. (1 - 2 
> weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, 
> Partitioners, Distributed Cache etc.) There are quite a few interfaces and it 
> will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more 
> unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to