This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch pyiceberg-0.6.x
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git
The following commit(s) were added to refs/heads/pyiceberg-0.6.x by this push:
new 813adbed [0.6.x] Backport PR #324 and #493 for fixing dead links in
docs (#556)
813adbed is described below
commit 813adbedd68fea6fa502b0da4c799d064f4317f3
Author: Honah J <[email protected]>
AuthorDate: Thu Mar 28 23:43:55 2024 -0700
[0.6.x] Backport PR #324 and #493 for fixing dead links in docs (#556)
* Github Action to check links in documentation (#324)
* add github add to check md link
* Only run under `mkdocs/**`
* ws
* make lint
---------
Co-authored-by: Fokko Driesprong <[email protected]>
* Fix dead links in docs (#493)
Backport to 0.6.1
---------
Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
---
.github/workflows/check-md-link.yml | 16 ++++++++++++++++
mkdocs/docs/SUMMARY.md | 4 ++++
mkdocs/docs/configuration.md | 20 ++++++++++++++++++++
mkdocs/docs/index.md | 2 +-
4 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/.github/workflows/check-md-link.yml
b/.github/workflows/check-md-link.yml
new file mode 100644
index 00000000..eec019a1
--- /dev/null
+++ b/.github/workflows/check-md-link.yml
@@ -0,0 +1,16 @@
+name: Check Markdown links
+
+on:
+ push:
+ paths:
+ - mkdocs/**
+ branches:
+ - 'main'
+ pull_request:
+
+jobs:
+ markdown-link-check:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@master
+ - uses: gaurav-nelson/github-action-markdown-link-check@v1
diff --git a/mkdocs/docs/SUMMARY.md b/mkdocs/docs/SUMMARY.md
index 40ba0bff..5cf753d4 100644
--- a/mkdocs/docs/SUMMARY.md
+++ b/mkdocs/docs/SUMMARY.md
@@ -17,6 +17,8 @@
<!-- prettier-ignore-start -->
+<!-- markdown-link-check-disable -->
+
- [Getting started](index.md)
- [Configuration](configuration.md)
- [CLI](cli.md)
@@ -28,4 +30,6 @@
- [How to release](how-to-release.md)
- [Code Reference](reference/)
+<!-- markdown-link-check-enable-->
+
<!-- prettier-ignore-end -->
diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 8acc0a98..d0c71b59 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -81,6 +81,8 @@ For the FileIO there are several configuration options
available:
### S3
+<!-- markdown-link-check-disable -->
+
| Key | Example | Description
|
| -------------------- | ------------------------ |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| s3.endpoint | https://10.0.19.25/ | Configure an alternative
endpoint of the S3 service for the FileIO to access. This could be used to use
S3FileIO with any s3-compatible object storage service that has a different
endpoint, or access a private S3 endpoint in a virtual private cloud. |
@@ -91,8 +93,12 @@ For the FileIO there are several configuration options
available:
| s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server
to be used by the FileIO.
|
| s3.connect-timeout | 60.0 | Configure socket
connection timeout, in seconds.
|
+<!-- markdown-link-check-enable-->
+
### HDFS
+<!-- markdown-link-check-disable -->
+
| Key | Example | Description
|
| -------------------- | ------------------- |
------------------------------------------------ |
| hdfs.host | https://10.0.19.25/ | Configure the HDFS host to
connect to |
@@ -100,8 +106,12 @@ For the FileIO there are several configuration options
available:
| hdfs.user | user | Configure the HDFS username
used for connection. |
| hdfs.kerberos_ticket | kerberos_ticket | Configure the path to the
Kerberos ticket cache. |
+<!-- markdown-link-check-enable-->
+
### Azure Data lake
+<!-- markdown-link-check-disable -->
+
| Key | Example
| Description
|
| ----------------------- |
-----------------------------------------------------------------------------------------
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| adlfs.connection-string |
AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=http://localhost/
| A [connection
string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string).
This could be used to use FileIO with any adlfs-compatible object storage
service that has a different endpoint (like
[azurite](https://github.com/azure/azurite)). |
@@ -112,8 +122,12 @@ For the FileIO there are several configuration options
available:
| adlfs.client-id | ad667be4-b811-11ed-afa1-0242ac120002
| The client-id
|
| adlfs.client-secret | oCA3R6P\*ka#oa1Sms2J74z...
| The client-secret
|
+<!-- markdown-link-check-enable-->
+
### Google Cloud Storage
+<!-- markdown-link-check-disable -->
+
| Key | Example | Description
|
| -------------------------- | ------------------- |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| gcs.project-id | my-gcp-project | Configure Google Cloud
Project for GCS FileIO.
|
@@ -128,6 +142,8 @@ For the FileIO there are several configuration options
available:
| gcs.default-location | US | Configure the default
location where buckets are created, like 'US' or 'EUROPE-WEST3'.
|
| gcs.version-aware | False | Configure whether to
support object versioning on the GCS bucket.
|
+<!-- markdown-link-check-enable-->
+
## REST Catalog
```yaml
@@ -145,6 +161,8 @@ catalog:
cabundle: /absolute/path/to/cabundle.pem
```
+<!-- markdown-link-check-disable -->
+
| Key | Example | Description
|
| ---------------------- | ----------------------- |
--------------------------------------------------------------------------------------------------
|
| uri | https://rest-catalog/ws | URI identifying the REST
Server |
@@ -155,6 +173,8 @@ catalog:
| rest.signing-name | execute-api | The service signing name
to use when SigV4 signing a request |
| rest.authorization-url | https://auth-service/cc | Authentication URL to use
for client credentials authentication (default: uri + 'v1/oauth/tokens') |
+<!-- markdown-link-check-enable-->
+
## SQL Catalog
The SQL catalog requires a database for its backend. PyIceberg supports
PostgreSQL and SQLite through psycopg2. The database connection has to be
configured using the `uri` property. See SQLAlchemy's [documentation for URL
format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls):
diff --git a/mkdocs/docs/index.md b/mkdocs/docs/index.md
index a8c2c6bd..1fee9cc6 100644
--- a/mkdocs/docs/index.md
+++ b/mkdocs/docs/index.md
@@ -61,7 +61,7 @@ You either need to install `s3fs`, `adlfs`, `gcsfs`, or
`pyarrow` to be able to
## Connecting to a catalog
-Iceberg leverages the [catalog to have one centralized place to organize the
tables](https://iceberg.apache.org/catalog/). This can be a traditional Hive
catalog to store your Iceberg tables next to the rest, a vendor solution like
the AWS Glue catalog, or an implementation of Icebergs' own [REST
protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the
[configuration](configuration.md) page to find all the configuration details.
+Iceberg leverages the [catalog to have one centralized place to organize the
tables](https://iceberg.apache.org/concepts/catalog/). This can be a
traditional Hive catalog to store your Iceberg tables next to the rest, a
vendor solution like the AWS Glue catalog, or an implementation of Icebergs'
own [REST protocol](https://github.com/apache/iceberg/tree/main/open-api).
Checkout the [configuration](configuration.md) page to find all the
configuration details.
For the sake of demonstration, we'll configure the catalog to use the
`SqlCatalog` implementation, which will store information in a local `sqlite`
database. We'll also configure the catalog to store data files in the local
filesystem instead of an object store. This should not be used in production
due to the limited scalability.