[ 
https://issues.apache.org/jira/browse/HADOOP-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Mosebach updated HADOOP-3999:
---------------------------------

    Attachment: cloud_divide.jpg

- Nodes collect local information (functions / performance indicators / other) 
via plugins
- We assume the job scheduler knows the information of these (now 
individualized) nodes 
- Cloud might be logically split up into several sections, functional ones, 
providing some special software or having some special capability
- The scheduler can now provide different quality (Service Levels, Software) as 
well as quantity levels (Performance, BW) to the customer.
- Customer now can submit different "profiled" jobs. Regarding the profile they 
submitted they can be charged at different cost.



> Dynamic host configuration system (via node side plugins)
> ---------------------------------------------------------
>
>                 Key: HADOOP-3999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3999
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: benchmarks, conf, metrics
>         Environment: Any
>            Reporter: Kai Mosebach
>         Attachments: cloud_divide.jpg
>
>
> The MapReduce paradigma is limited to run MapReduce jobs with the lowest 
> common factor of all nodes in the cluster.
> On the one hand this is wanted (cloud computing, throw simple jobs in, 
> nevermind who does it)
> On the other hand this is limiting the possibilities quite a lot, for 
> instance if you had data which could/needs to be fed to a 3rd party interface 
> like Mathlab, R, BioConductor you could solve a lot more jobs via hadoop.
> Furthermore it could be interesting to know about the OS, the architecture, 
> the performance of the node in relation to the rest of the cluster. 
> (Performance ranking)
> i.e. if i'd know about a sub cluster of very computing performant nodes or a 
> sub cluster of very fast disk-io nodes, the job tracker could select these 
> nodes regarding a so called job profile (i.e. my job is a heavy computing job 
> / heavy disk-io job), which can usually be estimated by a developer before.
> To achieve this, node capabilities could be introduced and stored in the DFS, 
> giving you
> a1.) basic information about each node (OS, ARCH)
> a2.) more sophisticated infos (additional software, path to software, 
> version). 
> a3.) PKI collected about the node (disc-io, cpu power, memory)
> a4.) network throughput to neighbor hosts, which might allow generating a 
> network performance map over the cluster
> This would allow you to
> b1.) generate jobs that have a profile (computing intensive, disk io 
> intensive, net io intensive)
> b2.) generate jobs that have software dependencies (run on Linux only, run on 
> nodes with MathLab only)
> b3.) generate a performance map of the cluster (sub clusters of fast disk 
> nodes, sub clusters of fast CPU nodes, network-speed-relation-map between 
> nodes)
> From step b3) you could then even acquire statistical information which could 
> again be fed into the DFS Namenode to see if we could store data on fast disk 
> subclusters only (that might need to be a tool outside of hadoop core though)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to