[jira] [Commented] (VXQUERY-131) Supporting Hadoop data and cluster management

Preston Carman (JIRA) Mon, 09 Mar 2015 18:06:07 -0700

    [ 
https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354018#comment-14354018
 ]


Preston Carman commented on VXQUERY-131:
----------------------------------------

VXQuery has it only cluster job management with Hyracks. The project does not 
use MapReduce. While the project does require data to be read from HDFS, once 
the information is in VXQuery, VXQuery will process and produce the result 
independent of Hadoop. Only look at how to read information from HDFS. 

Executing a VXQuery job or query could be managed through Yarn. Basically just 
like you execute a MapReduce job on a Hadoop cluster, you could do the same for 
a VXQuery job. 

Once the basic yarn job is ready and you can read data from HDFS, then you can 
optimize the job to make sure the data read and initial query processing 
happening locally on each node.

> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
>                 Key: VXQUERY-131
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-131
>             Project: VXQuery
>          Issue Type: Improvement
>            Reporter: Preston Carman
>            Assignee: Preston Carman
>              Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data 
> from this source. The project will include creating a strategy (with the 
> mentor's guidance) for reading XML data from HDFS and implementing it. When 
> connecting VXQuery to HDFS, the strategy may need to consider how to read 
> sections of an XML file. 
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN 
> (Yet Another Resource Negotiator) would be a good cluster management tool for 
> VXQuery. If VXQuery can read data from HDFS, then why not also manage the 
> cluster with a tool provided by Hadoop. The solution would replace the 
> current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (VXQUERY-131) Supporting Hadoop data and cluster management

Reply via email to