[ 
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende updated BAHIR-67:
---------------------------------
    Description: 
Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster

In today's world of Analytics many use cases need capability to access data 
from multiple remote data sources in Spark. Though Spark has great integration 
with local Hadoop cluster it lacks heavily on capability for connecting to a 
remote Hadoop cluster. However, in reality not all data of enterprises in 
Hadoop and running Spark Cluster locally with Hadoop Cluster is not always a 
solution.

In this improvement we propose to create a connector for accessing data (read 
and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs api.

  was:
In today's world of Analytics many use cases need capability to access data 
from multiple remote data sources in Spark. Though Spark has great integration 
with local Hadoop cluster it lacks heavily on capability for connecting to a 
remote Hadoop cluster. However, in reality not all data of enterprises in 
Hadoop and running Spark Cluster locally with Hadoop Cluster is not always a 
solution.

In this improvement we propose to create a connector for accessing data (read 
and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs api.

        Summary: WebHDFS Data Source for Spark SQL  (was: Ability to read/write 
data in Spark from/to HDFS of a remote Hadoop Cluster)

> WebHDFS Data Source for Spark SQL
> ---------------------------------
>
>                 Key: BAHIR-67
>                 URL: https://issues.apache.org/jira/browse/BAHIR-67
>             Project: Bahir
>          Issue Type: Improvement
>          Components: Spark SQL Data Sources
>            Reporter: Sourav Mazumder
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> In today's world of Analytics many use cases need capability to access data 
> from multiple remote data sources in Spark. Though Spark has great 
> integration with local Hadoop cluster it lacks heavily on capability for 
> connecting to a remote Hadoop cluster. However, in reality not all data of 
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster 
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read 
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs 
> api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to