bito-code-review[bot] commented on code in PR #35983: URL: https://github.com/apache/superset/pull/35983#discussion_r2490075899
########## requirements/base.in: ########## @@ -36,3 +36,6 @@ marshmallow-sqlalchemy>=1.3.0,<1.4.1 # needed for python 3.12 support openapi-schema-validator>=0.6.3 + +# Azure Blob Storage for report attachments +azure-storage-blob>=12.0.0 Review Comment: <div> <div id="suggestion"> <div id="issue"><b>Security vulnerability in azure-storage-blob dependency</b></div> <div id="fix"> The added azure-storage-blob>=12.0.0 introduces a known security vulnerability CVE-2022-30187, which is a CBC Padding Oracle attack that can lead to information disclosure and privilege escalation in environments using AES-CBC encryption. This affects the `azure-storage-blob` library when handling encrypted blobs, potentially compromising report attachments stored in Azure Blob Storage. Upgrade to >=12.13.0 to mitigate this risk, as the vulnerability is fixed in that version. </div> <details> <summary> <b>Code suggestion</b> </summary> <blockquote>Check the AI-generated fix before applying</blockquote> <div id="code"> ````suggestion # Azure Blob Storage for report attachments azure-storage-blob>=12.13.0 ```` </div> </details> </div> <small><i>Code Review Run <a href=https://github.com/apache/superset/pull/35983#issuecomment-3485329321>#e937ac</a></i></small> </div> --- Should Bito avoid suggestions like this for future reviews? (<a href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>) - [ ] Yes, avoid them ########## superset/reports/notifications/email.py: ########## @@ -147,15 +136,59 @@ def _get_content(self) -> EmailContent: else: html_table = "" - img_tags = [] - for msgid in images.keys(): - img_tags.append( - f"""<div class="image"> - <img width="1000" src="cid:{msgid}"> - </div> - """ + # Handle Azure Blob Storage for attachments Review Comment: <div> <div id="suggestion"> <div id="issue"><b>Breaking change removing email attachments when Azure Blob disabled</b></div> <div id="fix"> This change removes email attachments and embedded images entirely when AZURE_BLOB_REPORTS_ENABLED is False, breaking backward compatibility for users not using Azure Blob Storage. Previously, reports included attachments and embedded screenshots in emails. Now, when the config is disabled, no attachments are sent and images aren't displayed in the email body. This impacts the `_get_content` method which is called during email notification sending, affecting downstream consumers like report recipients who expect to receive attachments. To fix, add conditional logic to preserve the old attachment and embedding behavior when Azure Blob is not enabled, ensuring users continue to get attachments via email as before. </div> <details> <summary> <b>Code suggestion</b> </summary> <blockquote>Check the AI-generated fix before applying</blockquote> <div id="code"> ``` - # Handle Azure Blob Storage for attachments - azure_links_html = "" - if current_app.config.get("AZURE_BLOB_REPORTS_ENABLED", False): - # Upload attachments to Azure Blob Storage - from superset.utils.azure_blob import upload_report_attachments - - blob_urls = upload_report_attachments( - csv_data=self._content.csv, - pdf_data=self._content.pdf, - screenshots=self._content.screenshots, - report_name=self._name, - ) - - # Generate download links section if configured - if blob_urls and current_app.config.get( - "AZURE_BLOB_REPORTS_INCLUDE_LINKS", True - ): - links = [] - if "csv" in blob_urls: - links.append( - f'<li><a href="{blob_urls["csv"]}">Download CSV</a></li>' - ) - if "pdf" in blob_urls: - links.append( - f'<li><a href="{blob_urls["pdf"]}">Download PDF</a></li>' - ) - if "screenshots" in blob_urls: - for idx, url in enumerate(blob_urls["screenshots"], 1): - links.append( - f'<li><a href="{url}">Download Screenshot {idx}</a></li>' - ) - - if links: - azure_links_html = """ - <div style="margin-top: 20px; padding: 15px; background-color: #f5f5f5; border-radius: 5px;"> - <h3 style="margin-top: 0; color: #42526E;">Report Attachments</h3> - <ul style="list-style-type: none; padding-left: 0;"> - {} - </ul> - <p style="font-size: 12px; color: #6B778C; margin-bottom: 0;"> - <em>Links expire in {} days</em> - </p> - </div> - """.format( - "".join(links), - int( - current_app.config.get( - "AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS", 168 - ) - / 24 - ), - ) + # Get the domain from the 'From' address + # and make a message id without the < > in the end + domain = self._get_smtp_domain() + images = {} + img_tag = "" + azure_links_html = "" + + if current_app.config.get("AZURE_BLOB_REPORTS_ENABLED", False): + # Handle Azure Blob Storage for attachments + from superset.utils.azure_blob import upload_report_attachments + + blob_urls = upload_report_attachments( + csv_data=self._content.csv, + pdf_data=self._content.pdf, + screenshots=self._content.screenshots, + report_name=self._name, + ) + + # Generate download links section if configured + if blob_urls and current_app.config.get( + "AZURE_BLOB_REPORTS_INCLUDE_LINKS", True - ): + links = [] + if "csv" in blob_urls: + links.append( + f'<li><a href="{blob_urls["csv"]}">Download CSV</a></li>' + ) + if "pdf" in blob_urls: + links.append( + f'<li><a href="{blob_urls["pdf"]}">Download PDF</a></li>' + ) + if "screenshots" in blob_urls: + for idx, url in enumerate(blob_urls["screenshots"], 1): + links.append( + f'<li><a href="{url">Download Screenshot {idx}</a></li>' + ) + + if links: + azure_links_html = """ + <div style="margin-top: 20px; padding: 15px; background-color: #f5f5f5; border-radius: 5px;"> + <h3 style="margin-top: 0; color: #42526E;">Report Attachments</h3> + <ul style="list-style-type: none; padding-left: 0;"> - {} + {} + </ul> + <p style="font-size: 12px; color: #6B778C; margin-bottom: 0;"> + <em>Links expire in {} days</em> + </p> + </div> + """.format( + "".join(links), + int( + current_app.config.get( + "AZURE_BLOB_REPORTS_SAS_EXPIRY_HOURS", 168 + ) + / 24 + ), + ) + else: + # Traditional email with embedded images when Azure Blob is disabled + if self._content.screenshots: + images = { + make_msgid(domain)[1:-1]: screenshot + for screenshot in self._content.screenshots + } + + # Generate image tags for embedded screenshots + if images: + img_tags = [] + for cid in images: + img_tags.append(f'<img src="cid:{cid}" class="image">') + img_tag = "<br>".join(img_tags) @@ -217,1 +217,2 @@ - {html_table} + {html_table} + {img_tag} @@ -203,6 +203,10 @@ - } - a { - color: #0052CC; - text-decoration: none; - } - a:hover { - text-decoration: underline; - } + } + .image{ + margin-bottom: 18px; + min-width: 1000px; + } + a { + color: #0052CC; + text-decoration: none; + } + a:hover { + text-decoration: underline; + } @@ -224,6 +224,25 @@ - # Attachments are not sent via email - either stored in Azure Blob or disabled - return EmailContent( - body=body, - images=None, # No images attached - pdf=None, # No PDF attachments - data=None, # No CSV attachments - header_data=self._content.header_data, - ) + # Handle attachments based on Azure Blob configuration + if current_app.config.get("AZURE_BLOB_REPORTS_ENABLED", False): + # Attachments are stored in Azure Blob - don't send via email + return EmailContent( + body=body, + images=None, + pdf=None, + data=None, + header_data=self._content.header_data, + ) + else: + # Preserve backward compatibility - send attachments via email + csv_data = None + if self._content.csv: + csv_data = {__("%(name)s.csv", name=self._name): self._content.csv} + + pdf_data = None + if self._content.pdf: + pdf_data = {__("%(name)s.pdf", name=self._name): self._content.pdf} + + return EmailContent( + body=body, + images=images, + pdf=pdf_data, + data=csv_data, + header_data=self._content.header_data, + ) ``` </div> </details> </div> <small><i>Code Review Run <a href=https://github.com/apache/superset/pull/35983#issuecomment-3485329321>#e937ac</a></i></small> </div> --- Should Bito avoid suggestions like this for future reviews? (<a href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>) - [ ] Yes, avoid them ########## superset/utils/azure_blob.py: ########## @@ -0,0 +1,206 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Utility functions for Azure Blob Storage operations. +Used to store report/alert attachments in Azure Blob Storage. +""" + +import logging +from datetime import datetime, timedelta +from typing import Optional + +from flask import current_app + +logger = logging.getLogger(__name__) + + +def upload_to_azure_blob( + file_data: bytes, + file_name: str, + content_type: str = "application/octet-stream", +) -> Optional[str]: + """ + Upload a file to Azure Blob Storage and return the blob URL. + + :param file_data: The file content as bytes + :param file_name: The name of the file to store + :param content_type: MIME type of the file + :return: The URL of the uploaded blob, or None if upload fails + """ + try: + # Import azure-storage-blob only when needed to avoid hard dependency + from azure.storage.blob import BlobServiceClient, ContentSettings + + # Get Azure Blob Storage configuration from app config + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + container_name = current_app.config.get( + "AZURE_BLOB_REPORTS_CONTAINER", "superset-reports" + ) + + if not connection_string: + logger.warning( + "Azure Blob Storage is not configured. " + "Set AZURE_BLOB_CONNECTION_STRING in config." + ) + return None + + # Create BlobServiceClient + blob_service_client = BlobServiceClient.from_connection_string( + connection_string + ) + + # Get container client (create if doesn't exist) + container_client = blob_service_client.get_container_client(container_name) + if not container_client.exists(): + container_client.create_container() + logger.info(f"Created Azure Blob container: {container_name}") + + # Generate unique blob name with timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + blob_name = f"{timestamp}_{file_name}" + + # Upload the blob + blob_client = container_client.get_blob_client(blob_name) + blob_client.upload_blob( + file_data, + overwrite=True, + content_settings=ContentSettings(content_type=content_type), + ) + + blob_url = blob_client.url + logger.info(f"Successfully uploaded file to Azure Blob: {blob_url}") + return blob_url + + except ImportError: + logger.error( + "azure-storage-blob package is not installed. " + "Install it with: pip install azure-storage-blob" + ) + return None + except Exception as ex: + logger.error(f"Failed to upload file to Azure Blob Storage: {str(ex)}") + return None + + +def generate_blob_sas_url( + blob_url: str, expiry_hours: int = 168 +) -> Optional[str]: # 7 days default + """ + Generate a SAS (Shared Access Signature) URL for a blob with read permissions. + This allows temporary access to the blob without making it public. + + :param blob_url: The URL of the blob + :param expiry_hours: Number of hours until the SAS token expires (default 7 days) + :return: The blob URL with SAS token, or original URL if SAS generation fails + """ + try: + from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions + from datetime import timezone + + connection_string = current_app.config.get("AZURE_BLOB_CONNECTION_STRING") + if not connection_string: + logger.warning("Azure Blob Storage connection string not configured") + return blob_url + + # Parse connection string to get account name and key + conn_dict = dict(item.split("=", 1) for item in connection_string.split(";") if "=" in item) + account_name = conn_dict.get("AccountName") + account_key = conn_dict.get("AccountKey") + + if not account_name or not account_key: + logger.warning("Could not extract account credentials from connection string") + return blob_url + + # Extract container and blob name from URL + # URL format: https://{account}.blob.core.windows.net/{container}/{blob} + url_parts = blob_url.replace(f"https://{account_name}.blob.core.windows.net/", "").split("/", 1) + if len(url_parts) != 2: + logger.warning(f"Could not parse blob URL: {blob_url}") + return blob_url + + container_name, blob_name = url_parts + + # Generate SAS token + sas_token = generate_blob_sas( + account_name=account_name, + container_name=container_name, + blob_name=blob_name, + account_key=account_key, + permission=BlobSasPermissions(read=True), + expiry=datetime.now(timezone.utc) + timedelta(hours=expiry_hours), + ) + + sas_url = f"{blob_url}?{sas_token}" + return sas_url + + except ImportError: + logger.error("azure-storage-blob package is not installed") + return blob_url + except Exception as ex: Review Comment: <div> <div id="suggestion"> <div id="issue"><b>Avoid catching broad exception types</b></div> <div id="fix"> Catching `Exception` is too broad and may hide unexpected errors. Consider catching specific exception types like `AzureError` or `ClientError` instead. </div> <details> <summary> <b>Code suggestion</b> </summary> <blockquote>Check the AI-generated fix before applying</blockquote> <div id="code"> ````suggestion except (ImportError, ValueError, RuntimeError) as ex: ```` </div> </details> </div> <small><i>Code Review Run <a href=https://github.com/apache/superset/pull/35983#issuecomment-3485329321>#e937ac</a></i></small> </div> --- Should Bito avoid suggestions like this for future reviews? (<a href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>) - [ ] Yes, avoid them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
