TheNeuralBit commented on a change in pull request #12823:
URL: https://github.com/apache/beam/pull/12823#discussion_r496278227
##########
File path: website/www/site/content/en/documentation/io/built-in/snowflake.md
##########
@@ -362,3 +635,208 @@ static SnowflakeIO.CsvMapper<GenericRecord>
getCsvMapper() {
};
}
{{< /highlight >}}
+## Using SnowflakeIO in Python SDK
+### Intro
+Snowflake cross-language implementation is supporting both reading and writing
operations for Python programming language, thanks to
+cross-language which is part of [Portability Framework
Roadmap](https://beam.apache.org/roadmap/portability/) which aims to provide
full interoperability
+across the Beam ecosystem. From a developer perspective it means the
possibility of combining transforms written in different
languages(Java/Python/Go).
+
+Currently, cross-language is supporting only [Apache
Flink](https://flink.apache.org/) as a runner in a stable manner but plans are
to support all runners.
+For more information about cross-language please see [multi sdk
efforts](https://beam.apache.org/roadmap/connectors-multi-sdk/)
+and [Beam on top of
Flink](https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html)
articles.
+
+### Set up
+Please see [Apache Beam with Flink
runner](https://beam.apache.org/documentation/runners/flink/) for a setup.
+
+### Reading from Snowflake
+One of the functions of SnowflakeIO is reading Snowflake tables - either full
tables via table name or custom data via query. Output of the read transform is
a
[PCollection](https://beam.apache.org/releases/pydoc/2.20.0/apache_beam.pvalue.html#apache_beam.pvalue.PCollection)
of user-defined data type.
+#### General usage
+{{< highlight >}}
+OPTIONS = [
+ "--runner=FlinkRunner",
+ "--flink_version=1.10",
+ "--flink_master=localhost:8081",
+ "--environment_type=LOOPBACK"
+]
+
+with TestPipeline(options=PipelineOptions(OPTIONS)) as p:
+ (p
+ | ReadFromSnowflake(
+ server_name=<SNOWFLAKE SERVER NAME>,
+ username=<SNOWFLAKE USERNAME>,
+ password=<SNOWFLAKE PASSWORD>,
+ o_auth_token=<OAUTH TOKEN>,
+ private_key_path=<PATH TO P8 FILE>,
+ raw_private_key=<PRIVATE_KEY>
+ private_key_passphrase=<PASSWORD FOR KEY>,
+ schema=<SNOWFLAKE SCHEMA>,
+ database=<SNOWFLAKE DATABASE>,
+ staging_bucket_name=<GCS BUCKET NAME>,
+ storage_integration_name=<SNOWFLAKE STORAGE INTEGRATION NAME>,
+ csv_mapper=<CSV MAPPER FUNCTION>,
+ table=<SNOWFALKE TABLE>,
+ query=<IF NOT TABLE THEN QUERY>,
+ role=<SNOWFLAKE ROLE>,
+ warehouse=<SNOWFLAKE WAREHOUSE>,
+ expansion_service=<EXPANSION SERVICE ADDRESS>))
+{{< /highlight >}}
+
+#### Required parameters
+- `server_name` Full Snowflake server name with an account, zone, and domain.
+
+- `schema` Name of the Snowflake schema in the database to use.
+
+- `database` Name of the Snowflake database to use.
+
+- `staging_bucket_name` Name of the Google Cloud Storage bucket. Bucket will
be used as a temporary location for storing CSV files. Those temporary
directories will be named `sf_copy_csv_DATE_TIME_RANDOMSUFFIX` and they will be
removed automatically once Read operation finishes.
+
+- `storage_integration_name` Is the name of a Snowflake storage integration
object created according to [Snowflake
documentation](https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html).
+
+- `csv_mapper` Specifies a function which must translate user-defined object
to array of strings. SnowflakeIO uses a [COPY INTO
<location>](https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html)
statement to move data from a Snowflake table to Google Cloud Storage as CSV
files. These files are then downloaded via
[FileIO](https://beam.apache.org/releases/javadoc/2.3.0/index.html?org/apache/beam/sdk/io/FileIO.html)
and processed line by line. Each line is split into an array of Strings using
the [OpenCSV](http://opencsv.sourceforge.net/) library. The csv_mapper function
job is to give the user the possibility to convert the array of Strings to a
user-defined type, ie. GenericRecord for Avro or Parquet files, or custom
objects.
Review comment:
+1, thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]