[jira] [Comment Edited] (VXQUERY-131) Supporting Hadoop data and cluster management

Hamza Zafar (JIRA) Sun, 08 Mar 2015 04:52:19 -0700

    [ 
https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351740#comment-14351740
 ]


Hamza Zafar edited comment on VXQUERY-131 at 3/8/15 11:51 AM:
--------------------------------------------------------------

Dear Preston,

My Background:
I am Hamza Zafar, a final year undergrad student of computer sciences at NUST, 
Pakistan. I have been a student researcher at HPC research center at my 
department. At HPC lab we are focused on developing and maintaining  an open 
source Java based MPI MPJ-Express http://mpj-express.org/

Open-source Contributions:
Considering my final year project, I worked on Apache Hadoop and MPJ-Express 
project. The project requires writing a new Runtime for MPJ Express, to 
bootstrap its processes on Hadoop YARN cluster. The new
Runtime for MPJ Express will utilize the Hadoop YARN resource manager to 
dynamically allocate resources in terms of memory and CPU. As much of the 
enterprise data now resides on Hadoop Distributed File System (HDFS), this 
project will enable enterprise to achieve the performance of HPC and the 
usability and flexibility of Big Data stack. The development part of MPJ 
Express YARN runtime is completed, currently I am working on releasing the 
software in the next few weeks. A research paper is currently under review at 
ICCS (I can send you the soft-copy).

My Thoughts about the VXQuery and YARN project:
I did not have any past experience working with VXQuery project (I hope to 
learn it). I am comfortable writing the YARN applications. I anticipate that 
this project is geared towards replacing the python scripts to launch VXQuery 
jobs with the YARN resource manager. YARN can help spawn containers in the 
cluster, containers can then run the Queries on XML data files residing in 
HDFS. The Application Master can be very handy to reschedule the failed 
containers and maintain the running ones. 

Looking forward to work on this project :)

Yours Sincerely
Hamza Zafar
LinkedIn:  pk.linkedin.com/pub/hamza-zafar/59/739/205/ 




> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
>                 Key: VXQUERY-131
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-131
>             Project: VXQuery
>          Issue Type: Improvement
>            Reporter: Preston Carman
>            Assignee: Preston Carman
>              Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data 
> from this source. The project will include creating a strategy (with the 
> mentor's guidance) for reading XML data from HDFS and implementing it. When 
> connecting VXQuery to HDFS, the strategy may need to consider how to read 
> sections of an XML file. 
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN 
> (Yet Another Resource Negotiator) would be a good cluster management tool for 
> VXQuery. If VXQuery can read data from HDFS, then why not also manage the 
> cluster with a tool provided by Hadoop. The solution would replace the 
> current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (VXQUERY-131) Supporting Hadoop data and cluster management

Reply via email to