[
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Buzolin updated HAWQ-1270:
---------------------------------
Description: Since HAWQ only depends on Hadoop and Parquet for columnar
format support, I would like to propose pluggable storage backend design for
Hawq. Hadoop is already supported but there is Ceph - a distributed, storage
system which offers standard Posix compliant file system, object and a block
storage. Ceph is also data location aware, written in C++. and is more
sophisticated storage backend compare to Hadoop at this time. It provides
replicated and erasure encoded storage pools, Other great features of Ceph are:
snapshots and an algorithmic approach to map data to the nodes rather than
having centrally managed namenodes. I don't think HDFS offers any of these
features. In terms of performance, Ceph should be faster than HFDS since it is
written on C++ and because it doesn't have scalability limitations when mapping
data to storage pools, compare to Hadoop, where name node is such point of
contention. (was: Since HAWQ only depends on Hadoop and Parquet for columnar
format support, I would like to propose pluggable storage backend design for
Hawq. Hadoop is already supported but there is Ceph - a distributed, storage
system which offers standard Posix compliant file system, object and a block
storage. Ceph is also data location aware, written in C++. and is more
sophisticated storage backend compare to Hadoop at this time. It provides
replicated and erasure encoded storage pools, Other great features of Ceph are:
snapshots and an algorithmic approach to map data to the nodes rather than
having centrally managed namenodes. I don't think HDFS offers any of these
features. In terms of performance, Ceph should be faster than HFDS since it is
written on C++ and because
it doesn't have scalability limitations when mapping data to storage pools,
compare to Hadoop, where name node is such point of contention.)
> Plugged storage back-ends for HAWQ
> ----------------------------------
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
> Issue Type: Improvement
> Reporter: Dmitry Buzolin
> Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I
> would like to propose pluggable storage backend design for Hawq. Hadoop is
> already supported but there is Ceph - a distributed, storage system which
> offers standard Posix compliant file system, object and a block storage. Ceph
> is also data location aware, written in C++. and is more sophisticated
> storage backend compare to Hadoop at this time. It provides replicated and
> erasure encoded storage pools, Other great features of Ceph are: snapshots
> and an algorithmic approach to map data to the nodes rather than having
> centrally managed namenodes. I don't think HDFS offers any of these features.
> In terms of performance, Ceph should be faster than HFDS since it is written
> on C++ and because it doesn't have scalability limitations when mapping data
> to storage pools, compare to Hadoop, where name node is such point of
> contention.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)