Databricks): Handle legacy SPARK catalog [arrow-adbc]

via GitHub Tue, 27 May 2025 16:33:01 -0700


eric-wang-1990 commented on code in PR #2884:
URL: https://github.com/apache/arrow-adbc/pull/2884#discussion_r2110506372



##########
csharp/src/Drivers/Databricks/DatabricksStatement.cs:
##########
@@ -151,5 +158,233 @@ internal void SetMaxBytesPerFile(long maxBytesPerFile)
         {
             this.maxBytesPerFile = maxBytesPerFile;
         }
+
+        /// <summary>
+        /// Helper method to handle the special case for the "SPARK" catalog 
in metadata queries.
+        ///
+        /// Why:
+        /// - In Databricks, the legacy "SPARK" catalog is used as a 
placeholder to represent the default catalog.
+        /// - When a client requests metadata for the "SPARK" catalog, the 
underlying API expects a null catalog name
+        ///   to trigger default catalog behavior. Passing "SPARK" directly 
would not return the expected results.
+        ///
+        /// What it does:
+        /// - If the CatalogName property is set to "SPARK" 
(case-insensitive), this method sets it to null.
+        /// - This ensures that downstream API calls behave as if no catalog 
was specified, returning default catalog metadata.
+        ///
+        /// This logic is required to maintain compatibility with legacy tools 
and standards that expect "SPARK" to act as a default catalog alias.
+        /// </summary>
+        private void HandleSparkCatalog()
+        {
+            if (CatalogName != null && CatalogName.Equals("SPARK", 
StringComparison.OrdinalIgnoreCase))
+            {
+                CatalogName = null;
+            }
+        }
+
+        /// <summary>
+        /// Overrides the GetCatalogsAsync method to handle 
EnableMultipleCatalogSupport flag.
+        /// When EnableMultipleCatalogSupport is false, returns a single 
catalog "SPARK" without making an RPC call.
+        /// When EnableMultipleCatalogSupport is true, delegates to the base 
class implementation to retrieve actual catalogs.
+        /// </summary>
+        /// <param name="cancellationToken">Cancellation token</param>
+        /// <returns>Query result containing catalog information</returns>
+        protected override async Task<QueryResult> 
GetCatalogsAsync(CancellationToken cancellationToken = default)
+        {
+            // If EnableMultipleCatalogSupport is false, return a single 
catalog "SPARK" without making an RPC call
+            if (enableMultipleCatalogSupport)

Review Comment:
   Thanks for catching, Fixed



##########
csharp/src/Drivers/Apache/Hive2/HiveServer2Statement.cs:
##########
@@ -352,7 +352,7 @@ protected void ValidateOptions(IReadOnlyDictionary<string, 
string> properties)
             }
         }
 
-        private async Task<QueryResult> 
ExecuteMetadataCommandQuery(CancellationToken cancellationToken)
+        protected virtual async Task<QueryResult> 
ExecuteMetadataCommandQuery(CancellationToken cancellationToken)

Review Comment:
   Thanks for catching, this is not used anymore will revert



##########
csharp/src/Drivers/Databricks/DatabricksStatement.cs:
##########
@@ -151,5 +158,233 @@ internal void SetMaxBytesPerFile(long maxBytesPerFile)
         {
             this.maxBytesPerFile = maxBytesPerFile;
         }
+
+        /// <summary>
+        /// Helper method to handle the special case for the "SPARK" catalog 
in metadata queries.
+        ///
+        /// Why:
+        /// - In Databricks, the legacy "SPARK" catalog is used as a 
placeholder to represent the default catalog.
+        /// - When a client requests metadata for the "SPARK" catalog, the 
underlying API expects a null catalog name
+        ///   to trigger default catalog behavior. Passing "SPARK" directly 
would not return the expected results.
+        ///
+        /// What it does:
+        /// - If the CatalogName property is set to "SPARK" 
(case-insensitive), this method sets it to null.
+        /// - This ensures that downstream API calls behave as if no catalog 
was specified, returning default catalog metadata.
+        ///
+        /// This logic is required to maintain compatibility with legacy tools 
and standards that expect "SPARK" to act as a default catalog alias.
+        /// </summary>
+        private void HandleSparkCatalog()
+        {
+            if (CatalogName != null && CatalogName.Equals("SPARK", 
StringComparison.OrdinalIgnoreCase))
+            {
+                CatalogName = null;
+            }
+        }
+
+        /// <summary>
+        /// Overrides the GetCatalogsAsync method to handle 
EnableMultipleCatalogSupport flag.
+        /// When EnableMultipleCatalogSupport is false, returns a single 
catalog "SPARK" without making an RPC call.
+        /// When EnableMultipleCatalogSupport is true, delegates to the base 
class implementation to retrieve actual catalogs.
+        /// </summary>
+        /// <param name="cancellationToken">Cancellation token</param>
+        /// <returns>Query result containing catalog information</returns>
+        protected override async Task<QueryResult> 
GetCatalogsAsync(CancellationToken cancellationToken = default)
+        {
+            // If EnableMultipleCatalogSupport is false, return a single 
catalog "SPARK" without making an RPC call
+            if (enableMultipleCatalogSupport)
+            {
+                // Create a schema with a single column TABLE_CAT
+                var field = new Field("TABLE_CAT", StringType.Default, true);
+                var schema = new Schema(new[] { field }, null);
+
+                // Create a single row with value "SPARK"
+                var builder = new StringArray.Builder();
+                builder.Append("SPARK");
+                var array = builder.Build();
+
+                // Return the result without making an RPC call
+                return new QueryResult(1, new 
HiveServer2Connection.HiveInfoArrowStream(schema, new[] { array }));
+            }
+
+            // If EnableMultipleCatalogSupport is true, delegate to base class 
implementation
+            return await base.GetCatalogsAsync(cancellationToken);
+        }
+
+        /// <summary>
+        /// Overrides the GetSchemasAsync method to handle the SPARK catalog 
case.
+        /// When EnableMultipleCatalogSupport is true:
+        ///   - If catalog is "SPARK", sets catalogName to null in the API call
+        /// When EnableMultipleCatalogSupport is false:
+        ///   - If catalog is not null or SPARK, returns empty result without 
RPC call
+        /// </summary>
+        /// <param name="cancellationToken">Cancellation token</param>
+        /// <returns>Query result containing schema information</returns>
+        protected override async Task<QueryResult> 
GetSchemasAsync(CancellationToken cancellationToken = default)
+        {
+            // Handle SPARK catalog case
+            HandleSparkCatalog();
+
+            // If EnableMultipleCatalogSupport is false and catalog is not 
null or SPARK, return empty result without RPC call
+            if (enableMultipleCatalogSupport && CatalogName != null)
+            {
+                // Create a schema with TABLE_CATALOG and TABLE_SCHEMA columns

Review Comment:
   Fixed



##########
csharp/src/Drivers/Apache/Spark/SparkStatement.cs:
##########
@@ -52,5 +54,11 @@ protected override void 
SetStatementProperties(TExecuteStatementReq statement)
                 IntervalTypesAsArrow = false,
             };
         }
+
+        // Override ExecuteMetadataCommandQuery to allow derived classes to 
override it
+        protected override async Task<QueryResult> 
ExecuteMetadataCommandQuery(CancellationToken cancellationToken)

Review Comment:
   Thanks for catching, this is not used anymore will revert



##########
csharp/src/Drivers/Databricks/DatabricksStatement.cs:
##########
@@ -151,5 +158,233 @@ internal void SetMaxBytesPerFile(long maxBytesPerFile)
         {
             this.maxBytesPerFile = maxBytesPerFile;
         }
+
+        /// <summary>
+        /// Helper method to handle the special case for the "SPARK" catalog 
in metadata queries.
+        ///
+        /// Why:
+        /// - In Databricks, the legacy "SPARK" catalog is used as a 
placeholder to represent the default catalog.
+        /// - When a client requests metadata for the "SPARK" catalog, the 
underlying API expects a null catalog name
+        ///   to trigger default catalog behavior. Passing "SPARK" directly 
would not return the expected results.
+        ///
+        /// What it does:
+        /// - If the CatalogName property is set to "SPARK" 
(case-insensitive), this method sets it to null.
+        /// - This ensures that downstream API calls behave as if no catalog 
was specified, returning default catalog metadata.
+        ///
+        /// This logic is required to maintain compatibility with legacy tools 
and standards that expect "SPARK" to act as a default catalog alias.
+        /// </summary>
+        private void HandleSparkCatalog()
+        {
+            if (CatalogName != null && CatalogName.Equals("SPARK", 
StringComparison.OrdinalIgnoreCase))
+            {
+                CatalogName = null;
+            }
+        }
+
+        /// <summary>
+        /// Overrides the GetCatalogsAsync method to handle 
EnableMultipleCatalogSupport flag.
+        /// When EnableMultipleCatalogSupport is false, returns a single 
catalog "SPARK" without making an RPC call.
+        /// When EnableMultipleCatalogSupport is true, delegates to the base 
class implementation to retrieve actual catalogs.
+        /// </summary>
+        /// <param name="cancellationToken">Cancellation token</param>
+        /// <returns>Query result containing catalog information</returns>
+        protected override async Task<QueryResult> 
GetCatalogsAsync(CancellationToken cancellationToken = default)
+        {

Review Comment:
   Nope, the GetCatalogs call does not take in the catalogName as parameter so 
not needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(csharp/src/Drivers/Databricks): Handle legacy SPARK catalog [arrow-adbc]

Reply via email to