[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Buzolin updated HAWQ-1270:
---------------------------------
    Description: Since HAWQ only depends on Hadoop and Parquet for columnar 
format support, I would like to propose pluggable storage backend design for 
Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage 
system which offers standard Posix compliant file system, object and a block 
storage. Ceph is also data location aware, written in C++. and is more 
sophisticated storage backend compare to Hadoop at this time. It provides 
replicated and erasure encoded storage pools, Other great features of Ceph are: 
snapshots and an algorithmic approach to map data to the nodes rather than 
having centrally managed namenodes. I don't think HDFS offers any of these 
features. In terms of performance, Ceph should be faster than HFDS since it is 
written on C++ and because it doesn't have scalability limitations when mapping 
data to storage pools, compare to Hadoop, where name node is such point of 
contention.  (was: Since HAWQ only depends on Hadoop and Parquet for columnar 
format support, I would like to propose pluggable storage backend design for 
Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage 
system which offers standard Posix compliant file system, object and a block 
storage. Ceph is also data location aware, written in C++. and is more 
sophisticated storage backend compare to Hadoop at this time. It provides 
replicated and erasure encoded storage pools, Other great features of Ceph are: 
snapshots and an algorithmic approach to map data to the nodes rather than 
having centrally managed namenodes. I don't think HDFS offers any of these 
features. In terms of performance, Ceph should be faster than HFDS since it is 
written on C++ and because
it doesn't have scalability limitations when mapping data to storage pools, 
compare to Hadoop, where name node is such point of contention.)

> Plugged storage back-ends for HAWQ
> ----------------------------------
>
>                 Key: HAWQ-1270
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1270
>             Project: Apache HAWQ
>          Issue Type: Improvement
>            Reporter: Dmitry Buzolin
>            Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to