[jira] [Commented] (BAHIR-67) WebHDFS Data Source for Spark SQL

Christian Kadner (JIRA) Wed, 30 Nov 2016 15:16:25 -0800

    [ 
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710156#comment-15710156
 ]


Christian Kadner commented on BAHIR-67:
---------------------------------------

it looks like the only method we would need to override is 
{{WebHdfsFileSystem#toUrl}}:


{code:title=org.apache.hadoop.hdfs.web.WebHdfsFileSystem|borderStyle=solid}

  URL toUrl(final HttpOpParam.Op op, final Path fspath,
      final Param<?,?>... parameters) throws IOException {
    //initialize URI path and query
    final String path = PATH_PREFIX    // PATH_PREFIX = "/webhdfs/v1"
        + (fspath == null? "/": makeQualified(fspath).toUri().getRawPath());
    final String query = op.toQueryString()
        + Param.toSortedString("&", getAuthParameters(op))
        + Param.toSortedString("&", parameters);
    final URL url = getNamenodeURL(path, query);
    LOG.trace("url={}", url);
    return url;
  }

{code}

> WebHDFS Data Source for Spark SQL
> ---------------------------------
>
>                 Key: BAHIR-67
>                 URL: https://issues.apache.org/jira/browse/BAHIR-67
>             Project: Bahir
>          Issue Type: New Feature
>          Components: Spark SQL Data Sources
>            Reporter: Sourav Mazumder
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> In today's world of Analytics many use cases need capability to access data 
> from multiple remote data sources in Spark. Though Spark has great 
> integration with local Hadoop cluster it lacks heavily on capability for 
> connecting to a remote Hadoop cluster. However, in reality not all data of 
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster 
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read 
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs 
> api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BAHIR-67) WebHDFS Data Source for Spark SQL

Reply via email to