[
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Buzolin updated HAWQ-1270:
---------------------------------
Description:
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I
would like to propose pluggable storage backend design for Hawq. Hadoop is
already supported but there is Ceph - a distributed, storage system which
offers standard Posix compliant file system, object and a block storage. Ceph
is also data location aware, written in C++. and is more sophisticated storage
backend compare to Hadoop at this time. It provides replicated and erasure
encoded storage pools, Other great features of Ceph are: snapshots and an
algorithmic approach to map data to the nodes rather than having centrally
managed namenodes. I don't think HDFS offers any of these features. In terms of
performance, Ceph should be faster than HFDS since it is written on C++ and
because
it doesn't have scalability limitations when mapping data to storage pools,
compare to Hadoop, where name node is such point of contention.
was:
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I
would like to propose pluggable storage backend design for Hawq. Hadoop is
already supported but there is Ceph - a distributed, storage system which
offers standard Posix compliant file system, object and a block storage. Ceph
is also data location aware, written in C++. and is more sophisticated storage
backend compare to Hadoop at this time. It provides replicated and erasure
encoded storage pools, Other great features of Ceph is an algorytmic approach
to map data to the nodes rather than having centrally managed namenodes and
snapshots. I don't think HDFS offers any of these features. In terms of
performance, Ceph should be faster than HFDS since it is written on C++ and
because
it doesn't have scalability limitations when mapping data to storage pools,
compare to Hadoop, where name node is such point of contention.
> Plugged storage back-ends for HAWQ
> ----------------------------------
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
> Issue Type: Improvement
> Reporter: Dmitry Buzolin
> Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I
> would like to propose pluggable storage backend design for Hawq. Hadoop is
> already supported but there is Ceph - a distributed, storage system which
> offers standard Posix compliant file system, object and a block storage. Ceph
> is also data location aware, written in C++. and is more sophisticated
> storage backend compare to Hadoop at this time. It provides replicated and
> erasure encoded storage pools, Other great features of Ceph are: snapshots
> and an algorithmic approach to map data to the nodes rather than having
> centrally managed namenodes. I don't think HDFS offers any of these features.
> In terms of performance, Ceph should be faster than HFDS since it is written
> on C++ and because
> it doesn't have scalability limitations when mapping data to storage pools,
> compare to Hadoop, where name node is such point of contention.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)