[jira] [Issue Comment Deleted] (VXQUERY-131) Supporting Hadoop data and cluster management

Till Westmann (JIRA) Thu, 19 Mar 2015 23:56:56 -0700

     [ 
https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Till Westmann updated VXQUERY-131:
----------------------------------
    Comment: was deleted

(was: Referring to your comment:
Can we do a simple query on HDFS? (Start by reading a local file and transfer 
any additional file blocks as necessary to read the whole XML file. Loses 
efficiency when processing multiple block files.)

This implementation could be pretty straight forwards. Hadoop provides the 
FileSystem api to interact with data in hdfs. We can open a FSDataInputStream 
at a given path, if there are multiple blocks, then they are read 
sequentially(in-order) )

> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
>                 Key: VXQUERY-131
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-131
>             Project: VXQuery
>          Issue Type: Improvement
>            Reporter: Preston Carman
>            Assignee: Preston Carman
>              Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data 
> from this source. The project will include creating a strategy (with the 
> mentor's guidance) for reading XML data from HDFS and implementing it. When 
> connecting VXQuery to HDFS, the strategy may need to consider how to read 
> sections of an XML file. 
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN 
> (Yet Another Resource Negotiator) would be a good cluster management tool for 
> VXQuery. If VXQuery can read data from HDFS, then why not also manage the 
> cluster with a tool provided by Hadoop. The solution would replace the 
> current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (VXQUERY-131) Supporting Hadoop data and cluster management

Reply via email to