Hi Kyle, Good investigation!
I think we can add a similar tuple as hdfs in the pg_filesystem at first, then implement all API introduce in this tuple to call the FUSE API. However because HAWQ are designed for hdfs which means only append-only file system, so when we support other types of filesystem, we should investigate how to improve the performance and transaction issues. The performance can be investigate after we implement a demo, but the transaction issue should be decided before. Append only file system don't support UPDATE in place, and the inserted data are traced by file length in pg_aoseg.pg_aoseg_xxxxx or pg_parquet.pg_parquet_xxxxx. Thanks. On Tue, Mar 14, 2017 at 7:57 AM, Kyle Dunn <[email protected]> wrote: > Hello devs - > > I'm doing some reading about HAWQ tablespaces here: > http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html > > I want to understand the flow of things, please correct me on the following > assumptions: > > 1) Create a filesystem (not *really* supported after HAWQ init) - the > default is obviously [lib]HDFS[3]: > SELECT * FROM pg_filesystem; > > 2) Create a filespace, referencing the above file system: > CREATE FILESPACE testfs ON hdfs > ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1); > > 3) Create a tablespace, reference the above filespace: > CREATE TABLESPACE fastspace FILESPACE testfs; > > 4) Create objects referencing the above table space, or set it as the > database's default: > CREATE DATABASE testdb WITH TABLESPACE=testfs; > > Given this set of steps, it it true (*in theory*) an arbitrary filesystem > (i.e. storage backend) could be added to HAWQ using *existing* APIs? > > I realize the nuances of this are significant, but conceptually I'd like to > gather some details, mainly in support of this > <https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA discussion. > I'm daydreaming about whether this neat tool: > https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike > (which also seems to kind of work on Google Cloud, when interoperability > <https://github.com/s3fs-fuse/s3fs-fuse/issues/109#issuecomment-286222694> > mode is enabled). By it's Linux FUSE nature, it implements the lion's share > of required pg_filesystem functions; in fact, maybe we could actually use > system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>) > directly in this situation. > > Curious to get some feedback. > > > Thanks, > Kyle > -- > *Kyle Dunn | Data Engineering | Pivotal* > Direct: 303.905.3171 <3039053171> | Email: [email protected] >
