This is an automated email from the ASF dual-hosted git repository.
curth pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-adbc.git
The following commit(s) were added to refs/heads/main by this push:
new 243561997 docs(csharp/src/Drivers/Apache/Spark): document connection
properties (#2019)
243561997 is described below
commit 24356199756ee65e101d9e160c466d22f2acd961
Author: Bruce Irschick <[email protected]>
AuthorDate: Wed Jul 17 20:41:34 2024 -0700
docs(csharp/src/Drivers/Apache/Spark): document connection properties
(#2019)
Add documentation for connection properties
* updates existing documentation for Apache/Thrift-based drivers
---
csharp/src/Drivers/Apache/Spark/README.md | 84 +++++++++++++++++++++++++++++++
csharp/src/Drivers/Apache/readme.md | 39 +++-----------
2 files changed, 90 insertions(+), 33 deletions(-)
diff --git a/csharp/src/Drivers/Apache/Spark/README.md
b/csharp/src/Drivers/Apache/Spark/README.md
new file mode 100644
index 000000000..30116b049
--- /dev/null
+++ b/csharp/src/Drivers/Apache/Spark/README.md
@@ -0,0 +1,84 @@
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+-->
+
+# Spark Driver
+
+## Database and Connection Properties
+
+Properties should be passed in the call to `SparkDriver.Open`,
+but can also be passed in the call to `AdbcDatabase.Connect`.
+
+| Property | Description | Default |
+| :--- | :--- | :--- |
+| `adbc.spark.host` | Host name for the data source. Do no include scheme or
port number. Example: `sparkserver.region.cloudapp.azure.com` | |
+| `adbc.spark.port` | The port number the data source is listen on for new
connections. | `443` |
+| `adbc.spark.path` | The URI path on the data source server. Example:
`sql/protocolv1/o/0123456789123456/01234-0123456-source` | |
+| `adbc.spark.token` | For token-based authentication, the token to be
authenticated on the data source. Example: `abcdef0123456789` | |
+<!-- Add these properties when basic authentication is available.
+| `adbc.spark.scheme` | The HTTP or HTTPS scheme to use. Allowed values:
`http`, `https`. | `https` - when port is 443 or empty, `http`, otherwise. |
+| `auth_type` | An indicator of the intended type of authentication.
Allowed values: `basic`, `token`. This property is optional. The authentication
type can be inferred from `token`, `username`, and `password`. If a `token`
value is provided, token authentication is used. Otherwise, if both `username`
and `password` values are provided, basic authentication is used. | |
+| `username` | The user name used for basic authentication | |
+| `password` | The password for the user name used for basic
authentication. | |
+-->
+
+## Spark Types
+
+The following table depicts how the Spark ADBC driver converts a Spark type to
an Arrow type and a .NET type:
+
+| Spark Type | Arrow Type | C# Type |
+| :--- | :---: | :---: |
+| ARRAY* | String | string |
+| BIGINT | Int64 | long |
+| BINARY | Binary | byte[] |
+| BOOLEAN | Boolean | bool |
+| CHAR | String | string |
+| DATE | Date32 | DateTime |
+| DECIMAL | Decimal128 | SqlDecimal |
+| DOUBLE | Double | double |
+| FLOAT | Float | float |
+| INT | Int32 | int |
+| INTERVAL_DAY_TIME+ | String | string |
+| INTERVAL_YEAR_MONTH+ | String | string |
+| MAP* | String | string |
+| NULL | Null | null |
+| SMALLINT | Int16 | short |
+| STRING | String | string |
+| STRUCT* | String | string |
+| TIMESTAMP | Timestamp | DateTimeOffset |
+| TINYINT | Int8 | sbyte |
+| UNION | String | string |
+| USER_DEFINED | String | string |
+| VARCHAR | String | string |
+
+\* Complex types are returned as strings<br>
+\+ Interval types are returned as strings
+
+## Supported Variants
+
+### Spark on Databricks
+
+Support for Spark on Databricks is the most mature.
+
+The Spark ADBC driver supports token-based authentiation using the
+[Databricks personal access
token](https://docs.databricks.com/en/dev-tools/auth/pat.html).
+Basic (username and password) authenication is not supported, at this time.
+
+### Native Apache Spark
+
+This is currently unsupported.
diff --git a/csharp/src/Drivers/Apache/readme.md
b/csharp/src/Drivers/Apache/readme.md
index ec385f2e2..38d616074 100644
--- a/csharp/src/Drivers/Apache/readme.md
+++ b/csharp/src/Drivers/Apache/readme.md
@@ -18,6 +18,7 @@
-->
# Thrift-based Apache connectors
+
This library contains code for ADBC drivers built on top of the Thrift
protocol with Arrow support:
- Hive
@@ -27,6 +28,7 @@ This library contains code for ADBC drivers built on top of
the Thrift protocol
Each driver is at a different state of implementation.
## Custom generation
+
Typically, [Thrift](https://thrift.apache.org/) code is generated from the
Thrift compiler. And that is mostly true here as well. However, some files were
further edited to include Arrow support. These contain the phrase `BUT THIS
FILE HAS BEEN HAND EDITED TO SUPPORT ARROW SO REGENERATE AT YOUR OWN RISK` at
the top. Some of these files include:
```
@@ -41,55 +43,26 @@
arrow-adbc/csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TStringColumn.cs
```
# Hive
+
The Hive classes serve as the base class for Spark and Impala, since both of
those platform implement Hive capabilities.
Core functionality of the Hive classes beyond the base library implementation
is under development, has limited functionality, and may produce errors.
# Impala
+
The Imapala classes are under development, have limited functionality, and may
produce errors.
# Spark
-The Spark classes are intended for use against native Spark and Spark on
Databricks.
-## Spark Types
-
-The following table depicts how the Spark ADBC driver converts a Spark type to
an Arrow type and a .NET type:
-
-| Spark Type | Arrow Type | C# Type |
-| :--- | :---: | :---: |
-| ARRAY* | String | string |
-| BIGINT | Int64 | long |
-| BINARY | Binary | byte[] |
-| BOOLEAN | Boolean | bool |
-| CHAR | String | string |
-| DATE | Date32 | DateTime |
-| DECIMAL | Decimal128 | SqlDecimal |
-| DOUBLE | Double | double |
-| FLOAT | Float | float |
-| INT | Int32 | int |
-| INTERVAL_DAY_TIME+ | String | string |
-| INTERVAL_YEAR_MONTH+ | String | string |
-| MAP* | String | string |
-| NULL | Null | null |
-| SMALLINT | Int16 | short |
-| STRING | String | string |
-| STRUCT* | String | string |
-| TIMESTAMP | Timestamp | DateTimeOffset |
-| TINYINT | Int8 | sbyte |
-| UNION | String | string |
-| USER_DEFINED | String | string |
-| VARCHAR | String | string |
-
-\* Complex types are returned as strings<br>
-\+ Interval types are returned as strings
+The Spark classes are intended for use against native Spark and Spark on
Databricks.
+For more details, see [Spark Driver](Spark/README.md)
## Known Limitations
1. The API `SparkConnection.GetObjects` is not fully tested at this time
1. It may not return all catalogs and schema in the server.
1. It may throw an exception when returning object metadata from multiple
catalog and schema.
-1. API `Connection.GetTableSchema` does not return correct precision and scale
for `NUMERIC`/`DECIMAL` types.
1. When a `NULL` value is returned for a `BINARY` type it is instead being
returned as an empty array instead of the expected `null`.
1. Result set metadata does not provide information about the nullability of
each column. They are marked as `nullable` by default, which may not be
accurate.
1. The **Impala** driver is untested and is currently unsupported.