[
https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351740#comment-14351740
]
Hamza Zafar edited comment on VXQUERY-131 at 3/8/15 11:51 AM:
--------------------------------------------------------------
Dear Preston,
My Background:
I am Hamza Zafar, a final year undergrad student of computer sciences at NUST,
Pakistan. I have been a student researcher at HPC research center at my
department. At HPC lab we are focused on developing and maintaining an open
source Java based MPI MPJ-Express http://mpj-express.org/
Open-source Contributions:
Considering my final year project, I worked on Apache Hadoop and MPJ-Express
project. The project requires writing a new Runtime for MPJ Express, to
bootstrap its processes on Hadoop YARN cluster. The new
Runtime for MPJ Express will utilize the Hadoop YARN resource manager to
dynamically allocate resources in terms of memory and CPU. As much of the
enterprise data now resides on Hadoop Distributed File System (HDFS), this
project will enable enterprise to achieve the performance of HPC and the
usability and flexibility of Big Data stack. The development part of MPJ
Express YARN runtime is completed, currently I am working on releasing the
software in the next few weeks. A research paper is currently under review at
ICCS (I can send you the soft-copy).
My Thoughts about the VXQuery and YARN project:
I did not have any past experience working with VXQuery project (I hope to
learn it). I am comfortable writing the YARN applications. I anticipate that
this project is geared towards replacing the python scripts to launch VXQuery
jobs with the YARN resource manager. YARN can help spawn containers in the
cluster, containers can then run the Queries on XML data files residing in
HDFS. The Application Master can be very handy to reschedule the failed
containers and maintain the running ones.
Looking forward to work on this project :)
Yours Sincerely
Hamza Zafar
LinkedIn: pk.linkedin.com/pub/hamza-zafar/59/739/205/
> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
> Key: VXQUERY-131
> URL: https://issues.apache.org/jira/browse/VXQUERY-131
> Project: VXQuery
> Issue Type: Improvement
> Reporter: Preston Carman
> Assignee: Preston Carman
> Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data
> from this source. The project will include creating a strategy (with the
> mentor's guidance) for reading XML data from HDFS and implementing it. When
> connecting VXQuery to HDFS, the strategy may need to consider how to read
> sections of an XML file.
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN
> (Yet Another Resource Negotiator) would be a good cluster management tool for
> VXQuery. If VXQuery can read data from HDFS, then why not also manage the
> cluster with a tool provided by Hadoop. The solution would replace the
> current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)