Hello devs -
I'm doing some reading about HAWQ tablespaces here:
http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html
I want to understand the flow of things, please correct me on the following
assumptions:
1) Create a filesystem (not *really* supported after HAWQ init) - the
default is obviously [lib]HDFS[3]:
SELECT * FROM pg_filesystem;
2) Create a filespace, referencing the above file system:
CREATE FILESPACE testfs ON hdfs
('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1);
3) Create a tablespace, reference the above filespace:
CREATE TABLESPACE fastspace FILESPACE testfs;
4) Create objects referencing the above table space, or set it as the
database's default:
CREATE DATABASE testdb WITH TABLESPACE=testfs;
Given this set of steps, it it true (*in theory*) an arbitrary filesystem
(i.e. storage backend) could be added to HAWQ using *existing* APIs?
I realize the nuances of this are significant, but conceptually I'd like to
gather some details, mainly in support of this
<https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA discussion.
I'm daydreaming about whether this neat tool:
https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike
(which also seems to kind of work on Google Cloud, when interoperability
<https://github.com/s3fs-fuse/s3fs-fuse/issues/109#issuecomment-286222694>
mode is enabled). By it's Linux FUSE nature, it implements the lion's share
of required pg_filesystem functions; in fact, maybe we could actually use
system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>)
directly in this situation.
Curious to get some feedback.
Thanks,
Kyle
--
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: [email protected]