Hi! recently I got pretty excited about a possibility of using PXF outside of its original HAWQ use case. My ultimate wish here is to make PXF available to other Postgres-derived databases thus connecting them to the Hadoop ecosystem of data sources (think FDW-over-PXF).
With that ambitious goal in mind, I started at a much smaller MVP today and wanted to share my experience with you all. Basically my goal was to make PXF available to Apache Calcite as a backend (since Calcite itself doesn't deal with storage of data, algorithms to process data, and a repository for storing metadata). Calcite comes with a demo that allows you to treat a directory full of CSV files as a DB (with individual files being tables) and I wanted to extend that demo to use PXF reading CSV files from HDFS instead: http://calcite.apache.org/docs/tutorial.html https://github.com/apache/calcite/tree/master/example/csv/src/main/java/org/apache/calcite/adapter/csv Being a new to using PXF outside of HAWQ, I started looking for any kind of a "Standalone PXF" Quickstart guide but couldn't find any (please let me know if I missed it). What follows are my notes on what I've been able to do so far. Let me know if they are reasonable and I'll start collecting them on a wiki to help others get going with PXF. 1. My first challenge was to get a local PXF service running. I couldn't find any task that would help me do that so I did this: https://issues.apache.org/jira/browse/HAWQ-1224 2. My next challenge was to try and figure out the sequence of API calls that would be required to use PXF to ready data from a CSV file stored in a local HDFS (HDFS that happens to be backed by my local filesystem). The problem is that I couldn't really find any API quick start guide that would clearly describe the objects that PXF manipulates (nouns) what it can do with them (verbs) and, potentially, a state transition diagram to guide the client-side writers like myself. Did I miss a doc like that or should I file a JIRA for it to be created? 3. Even when I figured out some of the calls to make, there's still no client-side library available to translate those into the REST calls (or may be even short-circuit them when running as part of the same JVM as PXF). Does this sounds like something that needs to be addressed by PXF community? Shall I file a JIRA? Thanks, Roman.
