korbit-ai[bot] commented on code in PR #35983: URL: https://github.com/apache/superset/pull/35983#discussion_r2490026636
########## superset/config.py: ########## @@ -1739,6 +1739,25 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument # noq # The text for call-to-action link in Alerts & Reports emails EMAIL_REPORTS_CTA = "Explore in Superset" +# Azure Blob Storage configuration for report attachments +# Connection string for Azure Blob Storage account +# Example: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net" +AZURE_BLOB_CONNECTION_STRING: str | None = None + +# Container name for storing report attachments +AZURE_BLOB_REPORTS_CONTAINER = "superset-reports" + +# Enable Azure Blob Storage for report attachments +# When enabled, attachments will be uploaded to Azure Blob instead of being sent via email +AZURE_BLOB_REPORTS_ENABLED = False + +# Include links to Azure Blob attachments in email notifications +# When enabled, emails will include download links for attachments stored in Azure Blob +AZURE_BLOB_REPORTS_INCLUDE_LINKS = True + +# SAS token expiry time in hours for Azure Blob attachment links (default: 7 days) +AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS = 168 Review Comment: ### Missing validation for SAS token expiry hours <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The SAS token expiry time is hardcoded to 168 hours (7 days) without validation or bounds checking. ###### Why this matters This could lead to security issues if an extremely long expiry time is set, or functional issues if a very short time is set that doesn't allow users enough time to download attachments. ###### Suggested change ∙ *Feature Preview* Add validation to ensure the expiry time is within reasonable bounds. For example: ```python # SAS token expiry time in hours for Azure Blob attachment links (default: 7 days) # Valid range: 1-8760 hours (1 hour to 1 year) AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS = max(1, min(8760, 168)) ``` Or implement validation logic elsewhere in the application that uses this configuration. ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/c63cea08-d931-4dc0-923a-44ea6eb78baf/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/c63cea08-d931-4dc0-923a-44ea6eb78baf?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/c63cea08-d931-4dc0-923a-44ea6eb78baf?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/c63cea08-d931-4dc0-923a-44ea6eb78baf?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/c63cea08-d931-4dc0-923a-44ea6eb78baf) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:29185d09-4524-4776-a381-f5ed8266904b --> [](29185d09-4524-4776-a381-f5ed8266904b) ########## superset/config.py: ########## @@ -1739,6 +1739,25 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument # noq # The text for call-to-action link in Alerts & Reports emails EMAIL_REPORTS_CTA = "Explore in Superset" +# Azure Blob Storage configuration for report attachments +# Connection string for Azure Blob Storage account +# Example: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net" +AZURE_BLOB_CONNECTION_STRING: str | None = None + +# Container name for storing report attachments +AZURE_BLOB_REPORTS_CONTAINER = "superset-reports" + +# Enable Azure Blob Storage for report attachments +# When enabled, attachments will be uploaded to Azure Blob instead of being sent via email +AZURE_BLOB_REPORTS_ENABLED = False + +# Include links to Azure Blob attachments in email notifications +# When enabled, emails will include download links for attachments stored in Azure Blob +AZURE_BLOB_REPORTS_INCLUDE_LINKS = True + +# SAS token expiry time in hours for Azure Blob attachment links (default: 7 days) +AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS = 168 Review Comment: ### Fixed SAS token expiry lacks flexibility <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Hard-coded SAS token expiry time of 168 hours (7 days) may be inefficient for frequently accessed attachments, as it doesn't allow for dynamic adjustment based on usage patterns. ###### Why this matters This fixed expiry time could lead to unnecessary token regeneration for short-lived attachments or security risks for long-lived ones. A configurable or usage-based expiry would be more efficient. ###### Suggested change ∙ *Feature Preview* Consider making this configurable per report type or implementing dynamic expiry based on attachment access patterns. For example: ```python # Allow callable for dynamic expiry calculation AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS: int | Callable[[str], int] = 168 ``` Or provide different defaults for different report types. ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/520fd555-f5e0-4d05-b4f6-87a96b17f560/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/520fd555-f5e0-4d05-b4f6-87a96b17f560?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/520fd555-f5e0-4d05-b4f6-87a96b17f560?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/520fd555-f5e0-4d05-b4f6-87a96b17f560?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/520fd555-f5e0-4d05-b4f6-87a96b17f560) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:43a5c662-eb5c-4573-801f-d9687ef2fda4 --> [](43a5c662-eb5c-4573-801f-d9687ef2fda4) ########## superset/config.py: ########## @@ -1739,6 +1739,25 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument # noq # The text for call-to-action link in Alerts & Reports emails EMAIL_REPORTS_CTA = "Explore in Superset" +# Azure Blob Storage configuration for report attachments +# Connection string for Azure Blob Storage account +# Example: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net" +AZURE_BLOB_CONNECTION_STRING: str | None = None + +# Container name for storing report attachments +AZURE_BLOB_REPORTS_CONTAINER = "superset-reports" + +# Enable Azure Blob Storage for report attachments +# When enabled, attachments will be uploaded to Azure Blob instead of being sent via email +AZURE_BLOB_REPORTS_ENABLED = False + +# Include links to Azure Blob attachments in email notifications +# When enabled, emails will include download links for attachments stored in Azure Blob +AZURE_BLOB_REPORTS_INCLUDE_LINKS = True + +# SAS token expiry time in hours for Azure Blob attachment links (default: 7 days) +AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS = 168 Review Comment: ### Storage Configuration Tight Coupling <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The Azure Blob Storage configuration is tightly coupled with the main config file, violating the Single Responsibility Principle. Storage configurations should be modular and separate from core application settings. ###### Why this matters This tight coupling makes it harder to maintain, test and modify storage configurations independently. It also makes it more difficult to add support for other storage providers in the future. ###### Suggested change ∙ *Feature Preview* Extract Azure Blob Storage configuration into a separate storage configuration module: ```python # storage_config.py class StorageConfig: def __init__(self): self.provider = 'azure' self.connection_string = None self.container = 'superset-reports' self.enabled = False self.include_links = True self.sas_expiry_hours = 168 # config.py from storage_config import StorageConfig STORAGE_CONFIG = StorageConfig() ``` ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/8cef4532-3e87-43ca-b2bd-e522fddfc2cd/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/8cef4532-3e87-43ca-b2bd-e522fddfc2cd?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/8cef4532-3e87-43ca-b2bd-e522fddfc2cd?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/8cef4532-3e87-43ca-b2bd-e522fddfc2cd?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/8cef4532-3e87-43ca-b2bd-e522fddfc2cd) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:adfdafd1-a1ca-4473-a63d-aaa082a2bcfe --> [](adfdafd1-a1ca-4473-a63d-aaa082a2bcfe) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( + blob_url: str, expiry_hours: int = 168 +) -> Optional[str]: # 7 days default + """ + Generate a SAS (Shared Access Signature) URL for a blob with read permissions. + This allows temporary access to the blob without making it public. + + :param blob_url: The URL of the blob + :param expiry_hours: Number of hours until the SAS token expires (default 7 days) + :return: The blob URL with SAS token, or original URL if SAS generation fails + """ + try: + from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions + from datetime import timezone + + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + if not connection_string: + logger.warning("Azure Blob Storage connection string not configured") + return blob_url + + # Parse connection string to get account name and key + conn_dict = dict(item.split("=", 1) for item in connection_string.split(";") if "=" in item) + account_name = conn_dict.get("AccountName") + account_key = conn_dict.get("AccountKey") + + if not account_name or not account_key: + logger.warning("Could not extract account credentials from connection string") + return blob_url + + # Extract container and blob name from URL + # URL format: https://{account}.blob.core.windows.net/{container}/{blob} + url_parts = blob_url.replace(f"https://{account_name}.blob.core.windows.net/", "").split("/", 1) + if len(url_parts) != 2: + logger.warning(f"Could not parse blob URL: {blob_url}") + return blob_url + + container_name, blob_name = url_parts + + # Generate SAS token + sas_token = generate_blob_sas( + account_name=account_name, + container_name=container_name, + blob_name=blob_name, + account_key=account_key, + permission=BlobSasPermissions(read=True), + expiry=datetime.now(timezone.utc) + timedelta(hours=expiry_hours), + ) + + sas_url = f"{blob_url}?{sas_token}" + return sas_url + + except ImportError: + logger.error("azure-storage-blob package is not installed") + return blob_url + except Exception as ex: + logger.error(f"Failed to generate SAS URL: {str(ex)}") + return blob_url + + +def upload_report_attachments( + csv_data: Optional[bytes] = None, + pdf_data: Optional[bytes] = None, + screenshots: Optional[list[bytes]] = None, + report_name: str = "report", +) -> dict[str, str]: + """ + Upload multiple report attachments to Azure Blob Storage. + + :param csv_data: CSV file data + :param pdf_data: PDF file data + :param screenshots: List of screenshot images + :param report_name: Base name for the report files + :return: Dictionary with attachment types as keys and blob URLs as values + """ + urls = {} + + # Upload CSV if available + if csv_data: + csv_url = upload_to_azure_blob( + csv_data, f"{report_name}.csv", "text/csv" + ) + if csv_url: + urls["csv"] = generate_blob_sas_url(csv_url) or csv_url + + # Upload PDF if available + if pdf_data: + pdf_url = upload_to_azure_blob( + pdf_data, f"{report_name}.pdf", "application/pdf" + ) + if pdf_url: + urls["pdf"] = generate_blob_sas_url(pdf_url) or pdf_url + + # Upload screenshots if available + if screenshots: + screenshot_urls = [] + for idx, screenshot in enumerate(screenshots, 1): + screenshot_url = upload_to_azure_blob( + screenshot, f"{report_name}_screenshot_{idx}.png", "image/png" + ) + if screenshot_url: + sas_url = generate_blob_sas_url(screenshot_url) + screenshot_urls.append(sas_url or screenshot_url) Review Comment: ### Sequential screenshot uploads <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Screenshots are uploaded sequentially one by one, creating multiple individual network calls instead of batching operations. ###### Why this matters Sequential uploads create unnecessary latency and don't take advantage of potential parallelization or batching capabilities, leading to poor performance when uploading multiple screenshots. ###### Suggested change ∙ *Feature Preview* Use concurrent execution with `concurrent.futures.ThreadPoolExecutor` or `asyncio` to upload screenshots in parallel, or investigate Azure SDK batch upload capabilities to reduce total upload time. ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a1816a-e904-42b8-aff3-642373925083/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a1816a-e904-42b8-aff3-642373925083?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a1816a-e904-42b8-aff3-642373925083?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a1816a-e904-42b8-aff3-642373925083?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a1816a-e904-42b8-aff3-642373925083) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:e0db206d-17d9-41c3-a2d0-005487d3e1b0 --> [](e0db206d-17d9-41c3-a2d0-005487d3e1b0) ########## superset/config.py: ########## @@ -1739,6 +1739,25 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument # noq # The text for call-to-action link in Alerts & Reports emails EMAIL_REPORTS_CTA = "Explore in Superset" +# Azure Blob Storage configuration for report attachments +# Connection string for Azure Blob Storage account +# Example: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net" +AZURE_BLOB_CONNECTION_STRING: str | None = None + +# Container name for storing report attachments +AZURE_BLOB_REPORTS_CONTAINER = "superset-reports" + +# Enable Azure Blob Storage for report attachments +# When enabled, attachments will be uploaded to Azure Blob instead of being sent via email +AZURE_BLOB_REPORTS_ENABLED = False + +# Include links to Azure Blob attachments in email notifications +# When enabled, emails will include download links for attachments stored in Azure Blob +AZURE_BLOB_REPORTS_INCLUDE_LINKS = True Review Comment: ### Inconsistent Azure Blob configuration defaults <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The configuration allows AZURE_BLOB_REPORTS_INCLUDE_LINKS to be True while AZURE_BLOB_REPORTS_ENABLED is False, creating a logical inconsistency. ###### Why this matters If Azure Blob reports are disabled, including links to Azure Blob attachments would be meaningless and could cause runtime errors or confusing behavior when the system tries to generate links for non-existent blob storage. ###### Suggested change ∙ *Feature Preview* Either set both to False by default, or add logic to automatically disable link inclusion when blob reports are disabled: ```python # Enable Azure Blob Storage for report attachments AZURE_BLOB_REPORTS_ENABLED = False # Include links to Azure Blob attachments in email notifications # Automatically disabled when AZURE_BLOB_REPORTS_ENABLED is False AZURE_BLOB_REPORTS_INCLUDE_LINKS = False ``` ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1f13eaac-01fa-4cb2-9589-aad8aa1c02ce/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1f13eaac-01fa-4cb2-9589-aad8aa1c02ce?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1f13eaac-01fa-4cb2-9589-aad8aa1c02ce?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1f13eaac-01fa-4cb2-9589-aad8aa1c02ce?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1f13eaac-01fa-4cb2-9589-aad8aa1c02ce) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:657863a2-542d-416e-8d38-58a807274148 --> [](657863a2-542d-416e-8d38-58a807274148) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( + blob_url: str, expiry_hours: int = 168 +) -> Optional[str]: # 7 days default + """ + Generate a SAS (Shared Access Signature) URL for a blob with read permissions. + This allows temporary access to the blob without making it public. + + :param blob_url: The URL of the blob + :param expiry_hours: Number of hours until the SAS token expires (default 7 days) + :return: The blob URL with SAS token, or original URL if SAS generation fails + """ + try: + from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions + from datetime import timezone + + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + if not connection_string: + logger.warning("Azure Blob Storage connection string not configured") + return blob_url + + # Parse connection string to get account name and key + conn_dict = dict(item.split("=", 1) for item in connection_string.split(";") if "=" in item) + account_name = conn_dict.get("AccountName") + account_key = conn_dict.get("AccountKey") + + if not account_name or not account_key: + logger.warning("Could not extract account credentials from connection string") + return blob_url + + # Extract container and blob name from URL + # URL format: https://{account}.blob.core.windows.net/{container}/{blob} + url_parts = blob_url.replace(f"https://{account_name}.blob.core.windows.net/", "").split("/", 1) + if len(url_parts) != 2: + logger.warning(f"Could not parse blob URL: {blob_url}") + return blob_url + + container_name, blob_name = url_parts + + # Generate SAS token + sas_token = generate_blob_sas( + account_name=account_name, + container_name=container_name, + blob_name=blob_name, + account_key=account_key, + permission=BlobSasPermissions(read=True), + expiry=datetime.now(timezone.utc) + timedelta(hours=expiry_hours), + ) + + sas_url = f"{blob_url}?{sas_token}" + return sas_url + + except ImportError: + logger.error("azure-storage-blob package is not installed") + return blob_url + except Exception as ex: + logger.error(f"Failed to generate SAS URL: {str(ex)}") + return blob_url + + +def upload_report_attachments( + csv_data: Optional[bytes] = None, + pdf_data: Optional[bytes] = None, + screenshots: Optional[list[bytes]] = None, + report_name: str = "report", +) -> dict[str, str]: Review Comment: ### Repetitive Upload Logic <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The upload_report_attachments function handles multiple types of attachments with similar upload logic, leading to code duplication and reduced maintainability. ###### Why this matters The repetitive pattern of uploading different file types violates the DRY principle and makes it harder to modify the upload behavior consistently across all file types. ###### Suggested change ∙ *Feature Preview* Refactor to use a common upload handler: ```python def upload_attachment(data: bytes, file_name: str, content_type: str) -> Optional[str]: url = upload_to_azure_blob(data, file_name, content_type) return generate_blob_sas_url(url) if url else None def upload_report_attachments(...): attachments = { 'csv': (csv_data, f'{report_name}.csv', 'text/csv'), 'pdf': (pdf_data, f'{report_name}.pdf', 'application/pdf') } return {k: upload_attachment(*v) for k, v in attachments.items() if v[0]} ``` ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/022879a8-2513-40e8-9ec1-fb688e046380/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/022879a8-2513-40e8-9ec1-fb688e046380?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/022879a8-2513-40e8-9ec1-fb688e046380?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/022879a8-2513-40e8-9ec1-fb688e046380?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/022879a8-2513-40e8-9ec1-fb688e046380) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:1ad820b6-ad21-4e29-a8b5-3743fa12b2da --> [](1ad820b6-ad21-4e29-a8b5-3743fa12b2da) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( Review Comment: ### Missing Storage Abstraction Layer <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The Azure Blob Storage operations are tightly coupled with direct Azure SDK calls and configuration retrieval, making it difficult to test and modify the storage implementation. ###### Why this matters The tight coupling makes it challenging to unit test the code, switch to different storage providers, or mock the storage operations for testing. It also violates the Dependency Inversion Principle. ###### Suggested change ∙ *Feature Preview* Create an abstract storage interface and move Azure implementation to a concrete class: ```python class StorageProvider(ABC): @abstractmethod def upload_file(self, file_data: bytes, file_name: str, content_type: str) -> Optional[str]: pass @abstractmethod def generate_signed_url(self, file_url: str, expiry_hours: int) -> Optional[str]: pass class AzureBlobStorage(StorageProvider): def __init__(self, connection_string: str, container_name: str): self.connection_string = connection_string self.container_name = container_name ``` ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/f68e7ff0-299a-4e78-9082-44ec20033dca/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/f68e7ff0-299a-4e78-9082-44ec20033dca?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/f68e7ff0-299a-4e78-9082-44ec20033dca?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/f68e7ff0-299a-4e78-9082-44ec20033dca?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/f68e7ff0-299a-4e78-9082-44ec20033dca) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:f16415b6-a164-4446-b963-35cf2ecf5abd --> [](f16415b6-a164-4446-b963-35cf2ecf5abd) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( + blob_url: str, expiry_hours: int = 168 +) -> Optional[str]: # 7 days default + """ + Generate a SAS (Shared Access Signature) URL for a blob with read permissions. + This allows temporary access to the blob without making it public. + + :param blob_url: The URL of the blob + :param expiry_hours: Number of hours until the SAS token expires (default 7 days) + :return: The blob URL with SAS token, or original URL if SAS generation fails + """ + try: + from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions + from datetime import timezone + + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + if not connection_string: + logger.warning("Azure Blob Storage connection string not configured") + return blob_url + + # Parse connection string to get account name and key + conn_dict = dict(item.split("=", 1) for item in connection_string.split(";") if "=" in item) Review Comment: ### Connection string parsing vulnerable to crash <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The connection string parsing logic will crash if any item contains multiple '=' characters after the first one, as split('=', 1) creates exactly 2 elements, but the dict() constructor expects key-value pairs. ###### Why this matters If a connection string contains values with '=' characters (like in base64 encoded keys), the parsing will fail with a ValueError when trying to create the dictionary, causing the function to crash. ###### Suggested change ∙ *Feature Preview* Use a more robust parsing approach: ```python conn_dict = {} for item in connection_string.split(";"): if "=" in item: key, value = item.split("=", 1) conn_dict[key] = value ``` ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/48ff2b1f-698e-407a-8239-df32519d0b72/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/48ff2b1f-698e-407a-8239-df32519d0b72?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/48ff2b1f-698e-407a-8239-df32519d0b72?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/48ff2b1f-698e-407a-8239-df32519d0b72?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/48ff2b1f-698e-407a-8239-df32519d0b72) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:6ec0368d-e965-4597-b236-665e9a9d6753 --> [](6ec0368d-e965-4597-b236-665e9a9d6753) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( + blob_url: str, expiry_hours: int = 168 +) -> Optional[str]: # 7 days default + """ + Generate a SAS (Shared Access Signature) URL for a blob with read permissions. + This allows temporary access to the blob without making it public. + + :param blob_url: The URL of the blob + :param expiry_hours: Number of hours until the SAS token expires (default 7 days) + :return: The blob URL with SAS token, or original URL if SAS generation fails + """ + try: + from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions + from datetime import timezone + + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + if not connection_string: + logger.warning("Azure Blob Storage connection string not configured") + return blob_url + + # Parse connection string to get account name and key + conn_dict = dict(item.split("=", 1) for item in connection_string.split(";") if "=" in item) Review Comment: ### Repeated connection string parsing <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Connection string is parsed every time generate_blob_sas_url is called to extract account credentials. ###### Why this matters This string parsing operation is repeated for every SAS URL generation, creating unnecessary computational overhead when generating multiple SAS URLs. ###### Suggested change ∙ *Feature Preview* Cache the parsed connection string credentials using `@lru_cache` decorator or store them in module-level variables after first parse to avoid repeated string operations. ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/3a8cfb2d-4cb8-4d04-a190-28ea3e3cc0e6/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/3a8cfb2d-4cb8-4d04-a190-28ea3e3cc0e6?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/3a8cfb2d-4cb8-4d04-a190-28ea3e3cc0e6?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/3a8cfb2d-4cb8-4d04-a190-28ea3e3cc0e6?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/3a8cfb2d-4cb8-4d04-a190-28ea3e3cc0e6) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:cb6e3b3a-e5cf-450e-b1ca-d3cf9c7b17a6 --> [](cb6e3b3a-e5cf-450e-b1ca-d3cf9c7b17a6) ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() Review Comment: ### Inefficient client creation and container checks <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? A new BlobServiceClient is created for every file upload, and container existence is checked on each upload operation. ###### Why this matters This creates unnecessary overhead when uploading multiple files, as the client connection and container existence check are repeated for each operation instead of being reused. ###### Suggested change ∙ *Feature Preview* Create a module-level cached BlobServiceClient and cache container existence status. Use `@lru_cache` or a simple module-level variable to store the client and container status to avoid repeated initialization and existence checks. ###### Provide feedback to improve future suggestions [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/bdec360f-0104-486a-8fbf-2e91ff98355e/upvote) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/bdec360f-0104-486a-8fbf-2e91ff98355e?what_not_true=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/bdec360f-0104-486a-8fbf-2e91ff98355e?what_out_of_scope=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/bdec360f-0104-486a-8fbf-2e91ff98355e?what_not_in_standard=true) [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/bdec360f-0104-486a-8fbf-2e91ff98355e) </details> <sub> 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:e59b0f2a-c769-4522-ad7a-188f98a4448e --> [](e59b0f2a-c769-4522-ad7a-188f98a4448e) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
