Hi!

recently I got pretty excited about a possibility of using
PXF outside of its original HAWQ use case. My ultimate
wish here is to make PXF available to other Postgres-derived
databases thus connecting them to the Hadoop ecosystem of
data sources (think FDW-over-PXF).

With that ambitious goal in mind, I started at a much smaller
MVP today and wanted to share my experience with you all.

Basically my goal was to make PXF available to Apache Calcite
as a backend (since Calcite itself doesn't deal with storage of data,
algorithms to process data, and a repository for storing metadata).
Calcite comes with a demo that allows you to treat a directory
full of CSV files as a DB (with individual files being tables) and
I wanted to extend that demo to use PXF reading CSV files from HDFS
instead:
  http://calcite.apache.org/docs/tutorial.html
https://github.com/apache/calcite/tree/master/example/csv/src/main/java/org/apache/calcite/adapter/csv

Being a new to using PXF outside of HAWQ, I started looking
for any kind of a "Standalone PXF" Quickstart guide but couldn't find
any (please let me know if I missed it). What follows are my notes on
what I've been able to do so far. Let me know if they are reasonable
and I'll start collecting them on a wiki to help others get going with PXF.

1. My first challenge was to get a local PXF service running. I couldn't find
any task that would help me do that so I did this:
    https://issues.apache.org/jira/browse/HAWQ-1224

2. My next challenge was to try and figure out the sequence of API calls
that would be required to use PXF to ready data from a CSV file stored
in a local HDFS (HDFS that happens to be backed by my local filesystem).
The problem is that I couldn't really find any API quick start guide that would
clearly describe the objects that PXF manipulates (nouns) what it can do
with them (verbs) and, potentially, a state transition diagram to guide the
client-side writers like myself. Did I miss a doc like that or should
I file a JIRA
for it to be created?

3. Even when I figured out some of the calls to make, there's still no
client-side
library available to translate those into the REST calls (or may be
even short-circuit
them when running as part of the same JVM as PXF). Does this sounds like
something that needs to be addressed by PXF community? Shall I file a JIRA?

Thanks,
Roman.

Reply via email to