CurtHagenlocher commented on code in PR #2610:
URL: https://github.com/apache/arrow-adbc/pull/2610#discussion_r2010810377


##########
csharp/src/Drivers/Apache/Hive2/README.md:
##########
@@ -35,11 +35,15 @@ but can also be passed in the call to 
`AdbcDatabase.Connect`.
 | `username`             | The user name used for basic authentication | |
 | `password`             | The password for the user name used for basic 
authentication. | |
 | `adbc.hive.data_type_conv` | Comma-separated list of data conversion 
options. Each option indicates the type of conversion to perform on data 
returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. 
<br><br>Option `none` indicates there is no conversion from Hive type to native 
type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). 
Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform 
conversion (if necessary) from the Hive data type to corresponding Arrow data 
types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and 
`TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` 
| `scalar` |
-| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each 
option indicates the TLS/SSL option when connecting to a Hive server. 
<br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. 
<br><br>Option `allow_self_signed` allows certificate errors due to an unknown 
certificate authority, typically when using a self-signed certificate. Option 
`allow_hostname_mismatch` allow certificate errors due to a mismatch of the 
hostname. (e.g., when connecting through an SSH tunnel). Example 
`adbc.hive.tls_options=allow_self_signed` | |
 | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open 
a new session. Values can be 0 (infinite) or greater than zero. | `30000` |
 | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to 
retrieve in a single batch request. | `50000` |
 | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a 
result, this option sets the length of time (in milliseconds) to wait between 
polls. | `500` |
 | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) 
for a query to complete. Values can be 0 (infinite) or greater than zero. | 
`60` |
+| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | 
`False` |

Review Comment:
   Two high-level comments on the proposed API changes:
   1) I think it would good to group these together logically as they're 
logically related, so it would be good if they all started with the same 
distinct prefix.
   2) While "SSL" was the original protocol developed for security HTTP, it has 
since been succeeded by "TLS" and so I think it's broadly preferable to use 
"TLS" instead of "SSL".
   
   Might I propose the following set of options?
   `adbc.http_options.tls.enabled`: Whether or not to use TLS, One of `True`, 
`False`
   `adbc.http_options.tls.allow_self_signed`: ...
   `adbc.http_options.tls.trusted_certificate_path`: etc.



##########
csharp/src/Drivers/Apache/Spark/README.md:
##########
@@ -36,11 +36,15 @@ but can also be passed in the call to 
`AdbcDatabase.Connect`.
 | `username`             | The user name used for basic authentication | |
 | `password`             | The password for the user name used for basic 
authentication. | |
 | `adbc.spark.data_type_conv` | Comma-separated list of data conversion 
options. Each option indicates the type of conversion to perform on data 
returned from the Spark server. <br><br>Allowed values: `none`, `scalar`. 
<br><br>Option `none` indicates there is no conversion from Spark type to 
native type (i.e., no conversion from String to Timestamp for Apache Spark over 
HTTP). Example `adbc.spark.conv_data_type=none`. <br><br>Option `scalar` will 
perform conversion (if necessary) from the Spark data type to corresponding 
Arrow data types for types `DATE/Date32/DateTime`, 
`DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. 
Example `adbc.spark.conv_data_type=scalar` | `scalar` |
-| `adbc.spark.tls_options` | Comma-separated list of TLS/SSL options. Each 
option indicates the TLS/SSL option when connecting to a Spark server. 
<br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. 
<br><br>Option `allow_self_signed` allows certificate errors due to an unknown 
certificate authority, typically when using a self-signed certificate. Option 
`allow_hostname_mismatch` allow certificate errors due to a mismatch of the 
hostname. (e.g., when connecting through an SSH tunnel). Example 
`adbc.spark.tls_options=allow_self_signed` | |
 | `adbc.spark.connect_timeout_ms` | Sets the timeout (in milliseconds) to open 
a new session. Values can be 0 (infinite) or greater than zero. | `30000` |
 | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to 
retrieve in a single batch request. | `50000` |
 | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a 
result, this option sets the length of time (in milliseconds) to wait between 
polls. | `500` |
 | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) 
for a query to complete. Values can be 0 (infinite) or greater than zero. | 
`60` |
+| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | 
`False` |

Review Comment:
   (On the whole, the comments made on Hive2/README.md apply to Spark as well.



##########
csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs:
##########
@@ -0,0 +1,106 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using System;
+using System.IO;
+using System.Collections.Generic;
+using System.Net.Security;
+using System.Security.Cryptography.X509Certificates;
+using System.Net.Http;
+
+namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
+{
+    class TlsProperties
+    {
+        public bool IsSslEnabled { get; set; }

Review Comment:
   Consider `IsTlsEnabled`



##########
csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs:
##########
@@ -0,0 +1,106 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using System;
+using System.IO;
+using System.Collections.Generic;
+using System.Net.Security;
+using System.Security.Cryptography.X509Certificates;
+using System.Net.Http;
+
+namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
+{
+    class TlsProperties
+    {
+        public bool IsSslEnabled { get; set; }
+        public bool EnableServerCertificateValidation { get; set; }
+        public bool AllowHostnameMismatch { get; set; }
+        public bool AllowSelfSigned { get; set; }
+        public string? TrustedCertificatePath { get; set; }
+    }
+    static class HiveServer2TlsImpl

Review Comment:
   nit: insert blank line before `class`.



##########
csharp/src/Drivers/Apache/Hive2/README.md:
##########
@@ -35,11 +35,15 @@ but can also be passed in the call to 
`AdbcDatabase.Connect`.
 | `username`             | The user name used for basic authentication | |
 | `password`             | The password for the user name used for basic 
authentication. | |
 | `adbc.hive.data_type_conv` | Comma-separated list of data conversion 
options. Each option indicates the type of conversion to perform on data 
returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. 
<br><br>Option `none` indicates there is no conversion from Hive type to native 
type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). 
Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform 
conversion (if necessary) from the Hive data type to corresponding Arrow data 
types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and 
`TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` 
| `scalar` |
-| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each 
option indicates the TLS/SSL option when connecting to a Hive server. 
<br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. 
<br><br>Option `allow_self_signed` allows certificate errors due to an unknown 
certificate authority, typically when using a self-signed certificate. Option 
`allow_hostname_mismatch` allow certificate errors due to a mismatch of the 
hostname. (e.g., when connecting through an SSH tunnel). Example 
`adbc.hive.tls_options=allow_self_signed` | |

Review Comment:
   It would be nice and relatively low-cost to stay backwards-compatible by 
continuing to support the existing option for another release before removing 
it. (We haven't declared "1.0" yet so I don't think it's strictly necessary, 
but I think erring on the side of accommodating consumption scenarios is 
broadly good.)



##########
csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs:
##########
@@ -0,0 +1,106 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using System;
+using System.IO;
+using System.Collections.Generic;
+using System.Net.Security;
+using System.Security.Cryptography.X509Certificates;
+using System.Net.Http;
+
+namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
+{
+    class TlsProperties
+    {
+        public bool IsSslEnabled { get; set; }
+        public bool EnableServerCertificateValidation { get; set; }
+        public bool AllowHostnameMismatch { get; set; }
+        public bool AllowSelfSigned { get; set; }
+        public string? TrustedCertificatePath { get; set; }
+    }
+    static class HiveServer2TlsImpl
+    {
+        static internal TlsProperties 
GetTlsOptions(IReadOnlyDictionary<string, string> properties)
+        {
+            TlsProperties tlsProperties = new TlsProperties();
+            properties.TryGetValue(TlsOptions.IsSslEnabled, out string? 
isSslEnabled);
+            properties.TryGetValue(AdbcOptions.Uri, out string? uri);
+            if (!string.IsNullOrWhiteSpace(uri))

Review Comment:
   This generally assumes that the `out` parameter has a specific well-defined 
value even when the `Try` returns false. I think it's a better practice to 
structure the condition as `if (properties.TryGetValue(AdbcOptions.Uri, out 
string? uri) && !string.IsNullOrWhiteSpace(uri))`. (Same applies to all 
`TryGetValue` calls in this method.)



##########
csharp/src/Drivers/Apache/Hive2/README.md:
##########
@@ -35,11 +35,15 @@ but can also be passed in the call to 
`AdbcDatabase.Connect`.
 | `username`             | The user name used for basic authentication | |
 | `password`             | The password for the user name used for basic 
authentication. | |
 | `adbc.hive.data_type_conv` | Comma-separated list of data conversion 
options. Each option indicates the type of conversion to perform on data 
returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. 
<br><br>Option `none` indicates there is no conversion from Hive type to native 
type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). 
Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform 
conversion (if necessary) from the Hive data type to corresponding Arrow data 
types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and 
`TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` 
| `scalar` |
-| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each 
option indicates the TLS/SSL option when connecting to a Hive server. 
<br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. 
<br><br>Option `allow_self_signed` allows certificate errors due to an unknown 
certificate authority, typically when using a self-signed certificate. Option 
`allow_hostname_mismatch` allow certificate errors due to a mismatch of the 
hostname. (e.g., when connecting through an SSH tunnel). Example 
`adbc.hive.tls_options=allow_self_signed` | |
 | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open 
a new session. Values can be 0 (infinite) or greater than zero. | `30000` |
 | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to 
retrieve in a single batch request. | `50000` |
 | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a 
result, this option sets the length of time (in milliseconds) to wait between 
polls. | `500` |
 | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) 
for a query to complete. Values can be 0 (infinite) or greater than zero. | 
`60` |
+| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | 
`False` |
+| `adbc.apache.enable_server_certificate_validation` | If ssl server 
certificate validation needs to enabled or not. One of `True`, `False`. If set 
to False, all certificate validation errors are ignored | `True` |

Review Comment:
   The option name `enable_server_certificate_validation` implies that we're 
opting into enabling it. I think that it's more semantically clear to have an 
option `disable_server_certificate_validation` with a default value of false to 
give the correct impression that the settings are secure by default and that 
you have to opt into less security.



##########
csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs:
##########
@@ -0,0 +1,106 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+using System;
+using System.IO;
+using System.Collections.Generic;
+using System.Net.Security;
+using System.Security.Cryptography.X509Certificates;
+using System.Net.Http;
+
+namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
+{
+    class TlsProperties
+    {
+        public bool IsSslEnabled { get; set; }
+        public bool EnableServerCertificateValidation { get; set; }
+        public bool AllowHostnameMismatch { get; set; }
+        public bool AllowSelfSigned { get; set; }
+        public string? TrustedCertificatePath { get; set; }
+    }
+    static class HiveServer2TlsImpl
+    {
+        static internal TlsProperties 
GetTlsOptions(IReadOnlyDictionary<string, string> properties)
+        {
+            TlsProperties tlsProperties = new TlsProperties();
+            properties.TryGetValue(TlsOptions.IsSslEnabled, out string? 
isSslEnabled);
+            properties.TryGetValue(AdbcOptions.Uri, out string? uri);
+            if (!string.IsNullOrWhiteSpace(uri))
+            {
+                var uriValue = new Uri(uri);
+                tlsProperties.IsSslEnabled = uriValue.Scheme == 
Uri.UriSchemeHttps || (bool.TryParse(isSslEnabled, out bool isSslEnabledBool) 
&& isSslEnabledBool);
+            }
+            else
+            {
+                tlsProperties.IsSslEnabled = bool.TryParse(isSslEnabled, out 
bool isSslEnabledBool) && isSslEnabledBool;
+            }
+            if (!tlsProperties.IsSslEnabled)
+            {
+                return tlsProperties;
+            }
+            tlsProperties.IsSslEnabled = true;
+            
properties.TryGetValue(TlsOptions.EnableServerCertificateValidation, out 
string? enableServerCertificateValidation);
+            if (bool.TryParse(enableServerCertificateValidation, out bool 
enableServerCertificateValidationBool) && 
!enableServerCertificateValidationBool)
+            {
+                tlsProperties.EnableServerCertificateValidation = false;
+                return tlsProperties;
+            }
+            tlsProperties.EnableServerCertificateValidation = true;
+            properties.TryGetValue(TlsOptions.AllowHostnameMismatch, out 
string? allowHostnameMismatch);
+            tlsProperties.AllowHostnameMismatch = 
bool.TryParse(allowHostnameMismatch, out bool allowHostnameMismatchBool) && 
allowHostnameMismatchBool;
+            properties.TryGetValue(TlsOptions.AllowSelfSigned, out string? 
allowSelfSigned);
+            tlsProperties.AllowSelfSigned = bool.TryParse(allowSelfSigned, out 
bool allowSelfSignedBool) && allowSelfSignedBool;
+            if (tlsProperties.AllowSelfSigned)
+            {
+                properties.TryGetValue(TlsOptions.TrustedCertificatePath, out 
string? trustedCertificatePath);
+                if (trustedCertificatePath == null) return tlsProperties;
+                tlsProperties.TrustedCertificatePath = trustedCertificatePath 
!= "" && File.Exists(trustedCertificatePath) ? trustedCertificatePath : throw 
new FileNotFoundException("Trusted certificate path is invalid or file does not 
exist.");
+            }
+            return tlsProperties;
+        }
+
+        static internal HttpClientHandler NewHttpClientHandler(TlsProperties 
tlsProperties)
+        {
+            HttpClientHandler httpClientHandler = new();
+            if (tlsProperties.IsSslEnabled)
+            {
+                httpClientHandler.ServerCertificateCustomValidationCallback = 
(request, certificate, chain, policyErrors) =>
+                {
+                    if (policyErrors == SslPolicyErrors.None || 
!tlsProperties.EnableServerCertificateValidation) return true;
+                    if 
(string.IsNullOrEmpty(tlsProperties.TrustedCertificatePath))
+                    {
+                        return
+                            
(!policyErrors.HasFlag(SslPolicyErrors.RemoteCertificateChainErrors) || 
tlsProperties.AllowSelfSigned)
+                        && 
(!policyErrors.HasFlag(SslPolicyErrors.RemoteCertificateNameMismatch) || 
tlsProperties.AllowHostnameMismatch);
+                    }
+                    if (certificate == null)
+                        return false;
+                    X509Certificate2 customCertificate = new 
X509Certificate2(tlsProperties.TrustedCertificatePath);

Review Comment:
   What's the expected format of this file? Does it align with existing 
standards for supplying certificates to drivers?
   
   It would be good to update the README.md documentation with this information.



##########
csharp/src/Drivers/Apache/Hive2/README.md:
##########
@@ -35,11 +35,15 @@ but can also be passed in the call to 
`AdbcDatabase.Connect`.
 | `username`             | The user name used for basic authentication | |
 | `password`             | The password for the user name used for basic 
authentication. | |
 | `adbc.hive.data_type_conv` | Comma-separated list of data conversion 
options. Each option indicates the type of conversion to perform on data 
returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. 
<br><br>Option `none` indicates there is no conversion from Hive type to native 
type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). 
Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform 
conversion (if necessary) from the Hive data type to corresponding Arrow data 
types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and 
`TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` 
| `scalar` |
-| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each 
option indicates the TLS/SSL option when connecting to a Hive server. 
<br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. 
<br><br>Option `allow_self_signed` allows certificate errors due to an unknown 
certificate authority, typically when using a self-signed certificate. Option 
`allow_hostname_mismatch` allow certificate errors due to a mismatch of the 
hostname. (e.g., when connecting through an SSH tunnel). Example 
`adbc.hive.tls_options=allow_self_signed` | |
 | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open 
a new session. Values can be 0 (infinite) or greater than zero. | `30000` |
 | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to 
retrieve in a single batch request. | `50000` |
 | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a 
result, this option sets the length of time (in milliseconds) to wait between 
polls. | `500` |
 | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) 
for a query to complete. Values can be 0 (infinite) or greater than zero. | 
`60` |
+| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | 
`False` |

Review Comment:
   (We even use "Tls" in most other places in this codebase.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to