This is an automated email from the ASF dual-hosted git repository.

djwang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/cloudberry-pxf.git


The following commit(s) were added to refs/heads/main by this push:
     new 0376ea75 docs - pxf works with ORC using Foreign Data Wrapper (#52)
0376ea75 is described below

commit 0376ea75226a1dbd73f68af8df5278424a3c47af
Author: Nikolay Antonov <[email protected]>
AuthorDate: Thu Feb 5 12:50:26 2026 +0500

    docs - pxf works with ORC using Foreign Data Wrapper (#52)
    
    Add examples on how to use PXF foreign data wrappers to access ORC files on 
S3/HDFS.
---
 .../master_middleman/source/subnavs/pxf-subnav.erb |  4 +-
 docs/content/hdfs_orc.html.md.erb                  | 57 ++++++++++++++++++++--
 docs/content/objstore_avro.html.md.erb             |  2 +-
 docs/content/objstore_orc.html.md.erb              | 47 +++++++++++++++++-
 4 files changed, 103 insertions(+), 7 deletions(-)

diff --git a/docs/book/master_middleman/source/subnavs/pxf-subnav.erb 
b/docs/book/master_middleman/source/subnavs/pxf-subnav.erb
index f2d178c3..fbf2c28f 100644
--- a/docs/book/master_middleman/source/subnavs/pxf-subnav.erb
+++ b/docs/book/master_middleman/source/subnavs/pxf-subnav.erb
@@ -64,7 +64,7 @@
           <li><a href="/pxf/WIP/hdfs_text.html" format="markdown">Reading and 
Writing Text Data</a></li>
           <li><a href="/pxf/WIP/hdfs_avro.html" format="markdown">Reading and 
Writing Avro Data</a></li>
           <li><a href="/pxf/WIP/hdfs_json.html" format="markdown">Reading JSON 
Data</a></li>
-          <li><a href="/pxf/WIP/hdfs_orc.html" format="markdown">Reading ORC 
Data</a></li>
+          <li><a href="/pxf/WIP/hdfs_orc.html" format="markdown">Reading and 
Writing ORC Data</a></li>
           <li><a href="/pxf/WIP/hdfs_parquet.html" format="markdown">Reading 
and Writing Parquet Data</a></li>
           <li><a href="/pxf/WIP/hdfs_seqfile.html" format="markdown">Reading 
and Writing SequenceFile Data</a></li>
           <li><a href="/pxf/WIP/hdfs_fileasrow.html" format="markdown">Reading 
a Multi-Line Text File into a Single Table Row</a></li>
@@ -79,7 +79,7 @@
           <li><a href="/pxf/WIP/objstore_text.html" format="markdown">Reading 
and Writing Text Data</a></li>
           <li><a href="/pxf/WIP/objstore_avro.html" format="markdown">Reading 
and Writing Avro Data</a></li>
           <li><a href="/pxf/WIP/objstore_json.html" format="markdown">Reading 
JSON Data</a></li>
-          <li><a href="/pxf/WIP/objstore_orc.html" format="markdown">Reading 
ORC Data</a></li>
+          <li><a href="/pxf/WIP/objstore_orc.html" format="markdown">Reading 
and Writing ORC Data</a></li>
           <li><a href="/pxf/WIP/objstore_parquet.html" 
format="markdown">Reading and Writing Parquet Data</a></li>
           <li><a href="/pxf/WIP/objstore_seqfile.html" 
format="markdown">Reading and Writing SequenceFile Data</a></li>
           <li><a href="/pxf/WIP/objstore_fileasrow.html" 
format="markdown">Reading a Multi-Line Text File into a Single Table 
Row</a></li>
diff --git a/docs/content/hdfs_orc.html.md.erb 
b/docs/content/hdfs_orc.html.md.erb
index f429ee2b..69b248cc 100644
--- a/docs/content/hdfs_orc.html.md.erb
+++ b/docs/content/hdfs_orc.html.md.erb
@@ -34,7 +34,7 @@ ORC is type-aware and specifically designed for Hadoop 
workloads. ORC files stor
 
 ORC also supports predicate pushdown with built-in indexes at the file, 
stripe, and row levels, moving the filter operation to the data loading phase.
 
-Refer to the [Apache orc](https://orc.apache.org/docs/) documentation for 
detailed information about the ORC file format.
+Refer to the [Apache ORC](https://orc.apache.org/docs/) documentation for 
detailed information about the ORC file format.
 
 
 ## <a id="datatype_map"></a>Data Type Mapping
@@ -137,7 +137,7 @@ FORMAT 'CUSTOM' 
(FORMATTER='pxfwritable_import'|'pxfwritable_export')
 [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
 ```
 
-The specific keywords and values used in the Greenplum Database [CREATE 
EXTERNAL 
TABLE](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/ref_guide-sql_commands-CREATE_EXTERNAL_TABLE.html)
 command are described below.
+The specific keywords and values used in the Apache Cloudberry [CREATE 
EXTERNAL 
TABLE](https://cloudberry.apache.org/docs/sql-stmts/create-external-table) 
command are described below.
 
 | Keyword  | Value |
 |-------|-------------------------------------|
@@ -148,6 +148,31 @@ The specific keywords and values used in the Greenplum 
Database [CREATE EXTERNAL
 | FORMAT 'CUSTOM' | Use `FORMAT 'CUSTOM'`with 
`(FORMATTER='pxfwritable_export')` (write) or 
`(FORMATTER='pxfwritable_import')` (read).   |
 | DISTRIBUTED BY | If you want to load data from an existing Greenplum 
Database table into the writable external table, consider specifying the same 
distribution policy or `<column_name>` on both tables. Doing so will avoid 
extra motion of data between segments on the load operation.  |
 
+## <a id="profile_cfdw"></a>Creating the Foreign Table
+
+The PXF HDFS `hdfs_pxf_fdw` foreign data wrapper supports reading and writing 
ORC-formatted HDFS files. When you insert records into a foreign table, the 
block(s) of data that you insert are written to one file per segment in the 
directory that you specified in the `resource` clause.
+
+Use the following syntax to create an Apache Cloudberry foreign table that 
references an HDFS file or directory:
+
+``` sql
+CREATE SERVER <foreign_server> FOREIGN DATA WRAPPER hdfs_pxf_fdw;
+CREATE USER MAPPING FOR <user_name> SERVER <foreign_server>;
+
+CREATE FOREIGN TABLE [ IF NOT EXISTS ] <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+  SERVER <foreign_server>
+  OPTIONS ( resource '<path-to-file>', format 'orc' [, <custom-option> 
'<value>'[...]]);
+```
+
+The specific keywords and values used in the Apache Cloudberry [CREATE FOREIGN 
TABLE](https://cloudberry.apache.org/docs/sql-stmts/create-foreign-table) 
command are described below.
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<foreign_server\>    | The named server configuration that PXF uses to 
access the data. You can override credentials in `CREATE SERVER` statement as 
described in [Overriding the S3 Server Configuration for Foreign 
Tables](access_s3.html#s3_override_fdw) |
+| \<path&#8209;to&#8209;hdfs&#8209;file\>    | The path to the file or 
directory in the HDFS data store. When the `<server_name>` configuration 
includes a [`pxf.fs.basePath`](cfg_server.html#pxf-fs-basepath) property 
setting, PXF considers \<path&#8209;to&#8209;hdfs&#8209;file\> to be relative 
to the base path specified. Otherwise, PXF considers it to be an absolute path. 
\<path&#8209;to&#8209;hdfs&#8209;file\> must not specify a relative path nor 
include the dollar sign (`$`) character. |
+| format    | The file format; specify `'orc'` for ORC-formatted data. |
+| \<custom-option\>    | \<custom-option\>s are described below. |
+
 <a id="customopts"></a>
 The PXF `hdfs:orc` profile supports the following *read* options. You specify 
this option in the `LOCATION` clause:
 
@@ -164,7 +189,7 @@ The PXF `hdfs:orc` profile supports a single 
compression-related write option; y
 
 ## <a id="write"></a>About Writing ORC data
 
-When you insert records into a writable external table, the block(s) of data 
that you insert are written to one or more files in the directory that you 
specify in the `LOCATION` clause.
+When you insert records into a writable external table, the block(s) of data 
that you insert are written to one or more files in the directory that you 
specify in the `LOCATION` or `resource` clauses.
 
 When you insert ORC data records, the `pxf.orc.write.timezone.utc` property in 
the `pxf-site.xml` file governs how PXF writes timestamp values to the external 
data store. By default, PXF writes a timestamp type using the UTC time zone. If 
you require PXF to write a timestamp type using the local time zone of the PXF 
JVM, set the `pxf.orc.write.timezone.utc` property to `false` for the server 
and synchronize the PXF configuration.
 
@@ -231,6 +256,19 @@ Procedure:
              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
     ```
 
+    OR
+
+    ``` sql
+    testdb=# CREATE SERVER example_orc FOREIGN DATA WRAPPER hdfs_pxf_fdw;
+    testdb=# CREATE USER MAPPING FOR CURRENT_USER SERVER example_orc;
+    testdb=# CREATE FOREIGN TABLE sample_orc(location text, month text, 
num_orders int, total_sales numeric(10,2), items_sold text[])
+               SERVER example_orc
+               OPTIONS (
+                       resource 'data/pxf_examples/orc_example',
+                       format 'orc'
+               );
+    ```
+
 1. Read the data in the file by querying the `sample_orc` table:
 
     ``` sql
@@ -288,6 +326,19 @@ In this example, you create a writable external table to 
write some data to the
       FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');
     ```
 
+    OR
+
+    ```
+        testdb=# CREATE SERVER example_orc FOREIGN DATA WRAPPER hdfs_pxf_fdw;
+        testdb=# CREATE USER MAPPING FOR CURRENT_USER SERVER example_orc;
+        testdb=# CREATE FOREIGN TABLE sample_orc(location text, month text, 
num_orders int, total_sales numeric(10,2), items_sold text[])
+                   SERVER example_orc
+                   OPTIONS (
+                       resource 'data/pxf_examples/orc_example',
+                       format 'orc'
+                   );
+    ```
+
 1. Write a few records to segment files in the `orc_example` directory by 
inserting into the `write_to_sample_orc` table:
    
     ``` sql
diff --git a/docs/content/objstore_avro.html.md.erb 
b/docs/content/objstore_avro.html.md.erb
index 708c1df8..2b742e22 100644
--- a/docs/content/objstore_avro.html.md.erb
+++ b/docs/content/objstore_avro.html.md.erb
@@ -102,7 +102,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] <table_name>
 
 | Keyword  | Value |
 |-------|-------------------------------------|
-| \<foreign_server\>    | The named server configuration that PXF uses to 
access the data. You can override credentials in `CREATE SERVER` statement as 
described in [Overriding the S3 Server Configuration for External 
Tables](access_s3.html#s3_override_fdw) |
+| \<foreign_server\>    | The named server configuration that PXF uses to 
access the data. You can override credentials in `CREATE SERVER` statement as 
described in [Overriding the S3 Server Configuration for Foreign 
Tables](access_s3.html#s3_override_fdw) |
 | resource \<path&#8209;to&#8209;file\>    | The path to the directory or file 
in the object store. When the `<server_name>` configuration includes a 
[`pxf.fs.basePath`](cfg_server.html#pxf-fs-basepath) property setting, PXF 
considers \<path&#8209;to&#8209;file\> to be relative to the base path 
specified. Otherwise, PXF considers it to be an absolute path. 
\<path&#8209;to&#8209;file\> must not specify a relative path nor include the 
dollar sign (`$`) character. |
 | format 'avro'  | File format specification. |
 | \<custom&#8209;option\>=\<value\> | Avro-specific custom options are 
described in the [PXF HDFS Avro documentation](hdfs_avro.html#customopts). |
diff --git a/docs/content/objstore_orc.html.md.erb 
b/docs/content/objstore_orc.html.md.erb
index d24f2f0b..069fb578 100644
--- a/docs/content/objstore_orc.html.md.erb
+++ b/docs/content/objstore_orc.html.md.erb
@@ -39,7 +39,7 @@ FORMAT 'CUSTOM' 
(FORMATTER='pxfwritable_import'|'pxfwritable_export')
 [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
 ```
 
-The specific keywords and values used in the Greenplum Database [CREATE 
EXTERNAL 
TABLE](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/ref_guide-sql_commands-CREATE_EXTERNAL_TABLE.html)
 command are described in the table below.
+The specific keywords and values used in the Apache Cloudberry [CREATE 
EXTERNAL 
TABLE](https://cloudberry.apache.org/docs/sql-stmts/create-external-table) 
command are described in the table below.
 
 | Keyword  | Value |
 |-------|-------------------------------------|
@@ -52,6 +52,38 @@ The specific keywords and values used in the Greenplum 
Database [CREATE EXTERNAL
 
 If you are accessing an S3 object store, you can provide S3 credentials via 
custom options in the `CREATE EXTERNAL TABLE` command as described in 
[Overriding the S3 Server Configuration for External 
Tables](access_s3.html#s3_override_ext).
 
+## <a id="orc_cfdw"></a>Creating the Foreign Table
+
+Use one of the following foreign data wrappers with `format 'orc'`.
+
+| Object Store  | Foreign Data Wrapper |
+|-------|-------------------------------------|
+| Azure Blob Storage   | wasbs_pxf_fdw |
+| Azure Data Lake Storage Gen2    | abfss_pxf_fdw |
+| Google Cloud Storage    | gs_pxf_fdw |
+| MinIO    | s3_pxf_fdw |
+| S3    | s3_pxf_fdw |
+
+The following syntax creates a Apache Cloudberry foreign table that references 
an ORC-format file:
+
+``` sql
+CREATE SERVER <foreign_server> FOREIGN DATA WRAPPER <store>_pxf_fdw;
+CREATE USER MAPPING FOR <user_name> SERVER <foreign_server>;
+
+CREATE FOREIGN TABLE [ IF NOT EXISTS ] <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+  SERVER <foreign_server>
+  OPTIONS ( resource '<path-to-file>', format 'orc' [, <custom-option> 
'<value>' [, ...] ]);
+```
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<foreign_server\>    | The named server configuration that PXF uses to 
access the data. You can override credentials in `CREATE SERVER` statement as 
described in [Overriding the S3 Server Configuration for Foreign 
Tables](access_s3.html#s3_override_fdw) |
+| resource \<path&#8209;to&#8209;file\>    | The path to the directory or file 
in the object store. When the `<server_name>` configuration includes a 
[`pxf.fs.basePath`](cfg_server.html#pxf-fs-basepath) property setting, PXF 
considers \<path&#8209;to&#8209;file\> to be relative to the base path 
specified. Otherwise, PXF considers it to be an absolute path. 
\<path&#8209;to&#8209;file\> must not specify a relative path nor include the 
dollar sign (`$`) character. |
+| format 'orc'  | The file format; specify `'orc'` for ORC-formatted data. |
+| \<custom&#8209;option\>=\<value\> | ORC-specific custom options are 
described in the [PXF HDFS ORC documentation](hdfs_orc.html#customopts). |
+
+
 ## <a id="example"></a>Example
 
 Refer to [Example: Reading an ORC File on HDFS](hdfs_orc.html#read_example) in 
the PXF Hadoop ORC documentation for an example. Modifications that you must 
make to run the example with an object store include:
@@ -78,3 +110,16 @@ Refer to [Example: Reading an ORC File on 
HDFS](hdfs_orc.html#read_example) in t
     FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');
     ```
 
+- Or using the `CREATE FOREIGN TABLE` syntax create one foreign table to read 
and write operations. For example, if your server name is `s3srvcfg`:
+
+    ``` sql
+    CREATE SERVER s3srvcfg FOREIGN DATA WRAPPER s3_pxf_fdw;
+    CREATE USER MAPPING FOR CURRENT_USER SERVER s3srvcfg;
+
+    CREATE FOREIGN TABLE sample_orc (location TEXT, month TEXT, num_orders 
INT, total_sales NUMERIC(10,2), items_sold TEXT[])
+    SERVER s3srvcfg
+    OPTIONS (
+       resource 'BUCKET/pxf_examples/orc_example',
+       format 'orc'
+    )
+    ```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to