[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Buzolin updated HAWQ-1270:
---------------------------------
    Description: 
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
would like to propose pluggable storage backend design for Hawq. Hadoop is 
already supported but there is Ceph -  a distributed, storage system which 
offers standard Posix compliant file system, object and a block storage. Ceph 
is also data location aware, written in C++. and is more sophisticated storage 
backend compare to Hadoop at this time. It provides replicated and erasure 
encoded storage pools, Other great features of Ceph are: snapshots and an 
algorithmic approach to map data to the nodes rather than having centrally 
managed namenodes. I don't think HDFS offers any of these features. In terms of 
performance, Ceph should be faster than HFDS since it is written on C++ and 
because
it doesn't have scalability limitations when mapping data to storage pools, 
compare to Hadoop, where name node is such point of contention.

  was:
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
would like to propose pluggable storage backend design for Hawq. Hadoop is 
already supported but there is Ceph -  a distributed, storage system which 
offers standard Posix compliant file system, object and a block storage. Ceph 
is also data location aware, written in C++. and is more sophisticated storage 
backend compare to Hadoop at this time. It provides replicated and erasure 
encoded storage pools, Other great features of Ceph is an algorytmic approach 
to map data to the nodes rather than having centrally managed namenodes and 
snapshots. I don't think HDFS offers any of these features. In terms of 
performance, Ceph should be faster than HFDS since it is written on C++ and 
because
it doesn't have scalability limitations when mapping data to storage pools, 
compare to Hadoop, where name node is such point of contention.


> Plugged storage back-ends for HAWQ
> ----------------------------------
>
>                 Key: HAWQ-1270
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1270
>             Project: Apache HAWQ
>          Issue Type: Improvement
>            Reporter: Dmitry Buzolin
>            Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because
> it doesn't have scalability limitations when mapping data to storage pools, 
> compare to Hadoop, where name node is such point of contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to