codeant-ai-for-open-source[bot] commented on code in PR #36678: URL: https://github.com/apache/superset/pull/36678#discussion_r2624103486
########## superset-frontend/src/pages/DatasourceConnector/components/DataSourcePanel.tsx: ########## @@ -0,0 +1,185 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +import { t } from '@superset-ui/core'; +import { styled } from '@apache-superset/core/ui'; +import { + Button, + Divider, + Flex, + Icons, + Typography, +} from '@superset-ui/core/components'; +import type { DatabaseObject } from 'src/components/DatabaseSelector/types'; +import DatabaseSchemaPicker from './DatabaseSchemaPicker'; + +interface DataSourcePanelProps { + database: DatabaseObject | null; + catalog: string | null; + schema: string | null; + isSubmitting: boolean; + onDatabaseChange: (db: DatabaseObject | null) => void; + onCatalogChange: (catalog: string | null) => void; + onSchemaChange: (schema: string | null) => void; + onError: (msg: string) => void; + onAddNewDatabase: () => void; + onCancel: () => void; + onContinue: () => void; +} + +const PanelContainer = styled.div` + ${({ theme }) => ` + width: 100%; + max-width: 600px; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingLG}px; + `} +`; + +const IconCircle = styled.div` + ${({ theme }) => ` + width: ${theme.sizeUnit * 8}px; + height: ${theme.sizeUnit * 8}px; + flex-shrink: 0; Review Comment: **Suggestion:** The `IconCircle` styled component redundantly sets `width`, `height`, and `flex-shrink` twice with conflicting values, which is confusing, error-prone, and makes future style changes harder to reason about since only the later declarations actually take effect. [code quality] **Severity Level:** Minor ⚠️ ```suggestion ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The duplicated width/height/flex-shrink lines are redundant and confusing — the later declarations override earlier ones. Collapsing them to a single declaration clarifies intent and avoids accidental styling mistakes. This is a harmless refactor that improves maintainability. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/components/DataSourcePanel.tsx **Line:** 58:60 **Comment:** *Code Quality: The `IconCircle` styled component redundantly sets `width`, `height`, and `flex-shrink` twice with conflicting values, which is confusing, error-prone, and makes future style changes harder to reason about since only the later declarations actually take effect. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/components/DataSourcePanel.tsx: ########## @@ -0,0 +1,185 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +import { t } from '@superset-ui/core'; +import { styled } from '@apache-superset/core/ui'; +import { + Button, + Divider, + Flex, + Icons, + Typography, +} from '@superset-ui/core/components'; +import type { DatabaseObject } from 'src/components/DatabaseSelector/types'; +import DatabaseSchemaPicker from './DatabaseSchemaPicker'; + +interface DataSourcePanelProps { + database: DatabaseObject | null; + catalog: string | null; + schema: string | null; + isSubmitting: boolean; + onDatabaseChange: (db: DatabaseObject | null) => void; + onCatalogChange: (catalog: string | null) => void; + onSchemaChange: (schema: string | null) => void; + onError: (msg: string) => void; + onAddNewDatabase: () => void; + onCancel: () => void; + onContinue: () => void; +} + +const PanelContainer = styled.div` + ${({ theme }) => ` + width: 100%; + max-width: 600px; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingLG}px; + `} +`; + +const IconCircle = styled.div` + ${({ theme }) => ` + width: ${theme.sizeUnit * 8}px; + height: ${theme.sizeUnit * 8}px; + flex-shrink: 0; + width: ${theme.sizeUnit * 6}px; + height: ${theme.sizeUnit * 6}px; + border-radius: ${theme.borderRadius}px; + display: flex; + align-items: center; + justify-content: center; + background-color: ${theme.colorPrimaryBg}; + flex-shrink: 0; + `} +`; + +const FormSection = styled.div` + ${({ theme }) => ` + margin-bottom: ${theme.marginMD}px; + `} +`; + +const StyledDivider = styled(Divider)` + ${({ theme }) => ` + margin: ${theme.marginLG}px 0; + + .ant-divider-inner-text { + font-size: ${theme.fontSizeSM}px; + color: ${theme.colorTextSecondary}; + } + `} +`; + +const AddDatabaseButton = styled(Button)` + ${({ theme }) => ` + width: 100%; + margin-bottom: ${theme.marginLG}px; + `} +`; + +const FooterActionsWrapper = styled.div` + ${({ theme }) => ` + padding-top: ${theme.paddingMD}px; + border-top: 1px solid ${theme.colorBorder}; + `} +`; + +const PanelHeader = styled(Flex)` + ${({ theme }) => ` + margin-bottom: ${theme.marginLG}px; + `} +`; + +const TitleRow = styled(Flex)` + ${({ theme }) => ` + margin-bottom: ${theme.marginXS}px; + align-items: baseline; + `} +`; + +export default function DataSourcePanel({ + database, + catalog, + schema, + isSubmitting, + onDatabaseChange, + onCatalogChange, + onSchemaChange, + onError, + onAddNewDatabase, + onCancel, + onContinue, +}: DataSourcePanelProps) { + const canContinue = database !== null && schema !== null && !isSubmitting; Review Comment: **Suggestion:** The enable/disable logic for the Continue button only checks that the schema is not null, while the caller later treats an empty string as "not selected" (using a generic falsy check), which can allow the button to be enabled even though the subsequent step will immediately reject the action and show an error instead of progressing. [logic error] **Severity Level:** Minor ⚠️ ```suggestion const canContinue = !!database && !!schema && !isSubmitting; ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The current check only guards against null but allows empty strings (type is string | null). Using truthiness (!!database && !!schema) prevents enabling Continue when schema is an empty string, avoiding a race where the next step rejects the value and surfaces an error. This is a small behavioral correctness fix, not just stylistic. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/components/DataSourcePanel.tsx **Line:** 129:129 **Comment:** *Logic Error: The enable/disable logic for the Continue button only checks that the schema is not null, while the caller later treats an empty string as "not selected" (using a generic falsy check), which can allow the button to be enabled even though the subsequent step will immediately reject the action and show an error instead of progressing. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/DatasourceConnector.test.tsx: ########## @@ -0,0 +1,239 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import { + render, + screen, + userEvent, + waitFor, +} from 'spec/helpers/testing-library'; +import DatasourceConnector from './index'; + +// Mock useHistory +const mockPush = jest.fn(); +jest.mock('react-router-dom', () => ({ + ...jest.requireActual('react-router-dom'), + useHistory: () => ({ + push: mockPush, + }), +})); + +// Mock DatabaseModal +jest.mock('src/features/databases/DatabaseModal', () => ({ + __esModule: true, + default: ({ show, onHide, onDatabaseAdd }: any) => + show ? ( + <div data-test="database-modal"> + <button type="button" data-test="modal-close" onClick={onHide}> + Close + </button> + <button + type="button" + data-test="modal-add-db" Review Comment: **Suggestion:** The DatabaseModal mock uses a nonstandard `data-test` attribute while the tests query elements with `getByTestId`/`findByTestId`, which look for `data-testid`; this mismatch means the modal container and "Add Database" button will never be found, causing the modal-related tests to fail even if the component works correctly. [logic error] **Severity Level:** Minor ⚠️ ```suggestion <div data-testid="database-modal"> <button type="button" data-testid="modal-close" onClick={onHide}> Close </button> <button type="button" data-testid="modal-add-db" ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The tests in this file use getByTestId/findByTestId/queryByTestId to locate mocked elements (e.g., 'database-modal' and 'modal-add-db'), but the mock currently renders attributes named data-test. That mismatch means the queries will not find the elements, causing the modal-related tests to fail even though the mock otherwise provides the expected behavior. Switching to data-testid directly fixes the test selectors and is a real, necessary correction rather than a cosmetic change. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/DatasourceConnector.test.tsx **Line:** 42:48 **Comment:** *Logic Error: The DatabaseModal mock uses a nonstandard `data-test` attribute while the tests query elements with `getByTestId`/`findByTestId`, which look for `data-testid`; this mismatch means the modal container and "Add Database" button will never be found, causing the modal-related tests to fail even if the component works correctly. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/DatasourceConnector.test.tsx: ########## @@ -0,0 +1,239 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import { + render, + screen, + userEvent, + waitFor, +} from 'spec/helpers/testing-library'; +import DatasourceConnector from './index'; + +// Mock useHistory +const mockPush = jest.fn(); +jest.mock('react-router-dom', () => ({ + ...jest.requireActual('react-router-dom'), + useHistory: () => ({ + push: mockPush, + }), +})); + +// Mock DatabaseModal +jest.mock('src/features/databases/DatabaseModal', () => ({ + __esModule: true, + default: ({ show, onHide, onDatabaseAdd }: any) => + show ? ( + <div data-test="database-modal"> + <button type="button" data-test="modal-close" onClick={onHide}> + Close + </button> + <button + type="button" + data-test="modal-add-db" + onClick={() => + onDatabaseAdd({ + id: 999, + database_name: 'New DB', + backend: 'sqlite', + }) + } + > + Add Database + </button> + </div> + ) : null, +})); + +// Mock DatabaseSelector +jest.mock('src/components/DatabaseSelector', () => ({ + __esModule: true, + DatabaseSelector: ({ db, schema, onDbChange, onSchemaChange }: any) => ( + <div data-test="database-selector"> + <select + data-test="database-select" + value={db?.id || ''} + onChange={e => { + const id = parseInt(e.target.value, 10); + if (id) { + onDbChange({ id, database_name: `DB ${id}`, backend: 'sqlite' }); + } + }} + > + <option value="">Select database</option> + <option value="1">DB 1</option> + <option value="2">DB 2</option> + </select> + {db && ( + <select + data-test="schema-select" Review Comment: **Suggestion:** The DatabaseSelector mock assigns `data-test` instead of `data-testid` to its wrapper, database select, and schema select elements, but the tests rely on `getByTestId`/`findByTestId`/`queryByTestId`, so these elements will not be found and the schema/continue-button tests will fail even if the component behaves correctly. [logic error] **Severity Level:** Minor ⚠️ ```suggestion <div data-testid="database-selector"> <select data-testid="database-select" value={db?.id || ''} onChange={e => { const id = parseInt(e.target.value, 10); if (id) { onDbChange({ id, database_name: `DB ${id}`, backend: 'sqlite' }); } }} > <option value="">Select database</option> <option value="1">DB 1</option> <option value="2">DB 2</option> </select> {db && ( <select data-testid="schema-select" ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The test file queries the mocked selector via screen.findByTestId('database-select') and screen.findByTestId('schema-select'), but the mock currently renders data-test attributes. This is a real functional mismatch: the tests won't find the mocked select elements. Updating the mock to use data-testid restores alignment between rendered attributes and test queries and fixes failing tests. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/DatasourceConnector.test.tsx **Line:** 67:84 **Comment:** *Logic Error: The DatabaseSelector mock assigns `data-test` instead of `data-testid` to its wrapper, database select, and schema select elements, but the tests rely on `getByTestId`/`findByTestId`/`queryByTestId`, so these elements will not be found and the schema/continue-button tests will fail even if the component behaves correctly. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/components/DatabaseSchemaPicker.tsx: ########## @@ -0,0 +1,98 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +import { useCallback } from 'react'; +import { styled } from '@apache-superset/core/ui'; +import { DatabaseSelector } from 'src/components'; +import type { DatabaseObject } from 'src/components/DatabaseSelector/types'; + +interface DatabaseSchemaPickerProps { + database: DatabaseObject | null; + catalog: string | null; + schema: string | null; + onDatabaseChange: (db: DatabaseObject | null) => void; + onCatalogChange: (catalog: string | null) => void; + onSchemaChange: (schema: string | null) => void; + onError: (msg: string) => void; + onRefreshDatabases?: () => void; +} + +const PickerContainer = styled.div` + ${({ theme }) => ` + width: 100%; + + .database-selector-wrapper { + & > div { + margin-bottom: ${theme.paddingMD}px; + } + } + `} +`; + +export default function DatabaseSchemaPicker({ + database, + catalog, + schema, + onDatabaseChange, + onCatalogChange, + onSchemaChange, + onError, + onRefreshDatabases, +}: DatabaseSchemaPickerProps) { + const handleDbChange = useCallback( + (db: DatabaseObject) => { + onDatabaseChange(db); + // Clear catalog and schema when database changes + onCatalogChange(null); + onSchemaChange(null); + }, + [onDatabaseChange, onCatalogChange, onSchemaChange], + ); + + const handleCatalogChange = useCallback( + (cat?: string) => { + onCatalogChange(cat ?? null); + // Clear schema when catalog changes + onSchemaChange(null); + }, + [onCatalogChange, onSchemaChange], + ); + + const handleSchemaChange = useCallback( + (sch?: string) => { + onSchemaChange(sch ?? null); + }, + [onSchemaChange], + ); + + return ( Review Comment: **Suggestion:** The styled container uses a nested `.database-selector-wrapper` selector while also applying that class directly to the container, so the margin rule never matches the intended element and the layout spacing for the embedded DatabaseSelector is incorrect. [logic error] **Severity Level:** Minor ⚠️ ```suggestion & > div { margin-bottom: ${theme.paddingMD}px; } `} `; return ( <PickerContainer> ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The styled rule targets a nested element with class .database-selector-wrapper, but that class is applied to the PickerContainer itself. Therefore the selector never matches the container's direct child divs and the intended margin-bottom won't be applied. Changing the CSS to target & > div (or removing the className from the JSX and keeping the nested selector consistent) fixes a real layout bug rather than being cosmetic. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/components/DatabaseSchemaPicker.tsx **Line:** 39:83 **Comment:** *Logic Error: The styled container uses a nested `.database-selector-wrapper` selector while also applying that class directly to the container, so the margin rule never matches the intended element and the layout spacing for the embedded DatabaseSelector is incorrect. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/datasource_analyzer/api.py: ########## @@ -0,0 +1,123 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging + +from flask import request, Response +from flask_appbuilder.api import expose, protect, safe +from marshmallow import ValidationError + +from superset.constants import MODEL_API_RW_METHOD_PERMISSION_MAP +from superset.datasource_analyzer.commands.initiate import ( + InitiateDatasourceAnalyzerCommand, +) +from superset.datasource_analyzer.exceptions import ( + DatasourceAnalyzerAccessDeniedError, + DatasourceAnalyzerDatabaseNotFoundError, + DatasourceAnalyzerInvalidError, + DatasourceAnalyzerSchemaNotFoundError, +) +from superset.datasource_analyzer.schemas import ( + DatasourceAnalyzerPostSchema, + DatasourceAnalyzerResponseSchema, +) +from superset.extensions import event_logger +from superset.views.base_api import BaseSupersetApi, requires_json, statsd_metrics + +logger = logging.getLogger(__name__) + + +class DatasourceAnalyzerRestApi(BaseSupersetApi): + """REST API for datasource analyzer operations.""" + + method_permission_name = MODEL_API_RW_METHOD_PERMISSION_MAP + allow_browser_login = True + class_permission_name = "DatasourceAnalyzer" + resource_name = "datasource_analyzer" + openapi_spec_tag = "Datasource Analyzer" + openapi_spec_component_schemas = ( + DatasourceAnalyzerPostSchema, + DatasourceAnalyzerResponseSchema, + ) + + @expose("/", methods=("POST",)) + @protect() + @safe + @statsd_metrics + @requires_json + @event_logger.log_this_with_context( + action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.post", + log_to_statsd=True, + ) + def post(self) -> Response: + """Initiate a datasource analysis job. + --- + post: + summary: Initiate datasource analysis + description: >- + Initiates a background job to analyze a database schema. + Returns a run_id that can be used to track the job status. + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/DatasourceAnalyzerPostSchema' + responses: + 200: + description: Analysis job initiated successfully + content: + application/json: + schema: + type: object + properties: + result: + $ref: '#/components/schemas/DatasourceAnalyzerResponseSchema' + 400: + $ref: '#/components/responses/400' + 401: + $ref: '#/components/responses/401' + 403: + $ref: '#/components/responses/403' + 404: + $ref: '#/components/responses/404' + 422: + $ref: '#/components/responses/422' + 500: + $ref: '#/components/responses/500' + """ + try: + schema = DatasourceAnalyzerPostSchema() + data = schema.load(request.json) + except ValidationError as error: + return self.response_400(message=error.messages) + + try: + command = InitiateDatasourceAnalyzerCommand( + database_id=data["database_id"], + schema_name=data["schema_name"], + catalog_name=data.get("catalog_name"), + ) + result = command.run() + return self.response(200, result=result) + except DatasourceAnalyzerDatabaseNotFoundError: + return self.response_404() + except DatasourceAnalyzerAccessDeniedError: + return self.response_403() + except DatasourceAnalyzerSchemaNotFoundError as ex: + return self.response_400(message=str(ex)) Review Comment: **Suggestion:** When the referenced schema does not exist the code currently returns a 400 response, but this condition represents a missing resource and should surface as 404 to match the OpenAPI documentation and common REST semantics, otherwise clients may mis-handle this case as a bad request instead of not found. [logic error] **Severity Level:** Minor ⚠️ ```suggestion return self.response_404(message=str(ex)) ``` <details> <summary><b>Why it matters? ⭐ </b></summary> A missing schema is a missing resource; the OpenAPI responses enumerated for this endpoint include 404. Mapping SchemaNotFound to 404 aligns with REST semantics and the documented contract, so this is a correctness fix rather than a cosmetic change. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/datasource_analyzer/api.py **Line:** 121:121 **Comment:** *Logic Error: When the referenced schema does not exist the code currently returns a 400 response, but this condition represents a missing resource and should surface as 404 to match the OpenAPI documentation and common REST semantics, otherwise clients may mis-handle this case as a bad request instead of not found. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/datasource_analyzer/api.py: ########## @@ -0,0 +1,123 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging + +from flask import request, Response +from flask_appbuilder.api import expose, protect, safe +from marshmallow import ValidationError + +from superset.constants import MODEL_API_RW_METHOD_PERMISSION_MAP +from superset.datasource_analyzer.commands.initiate import ( + InitiateDatasourceAnalyzerCommand, +) +from superset.datasource_analyzer.exceptions import ( + DatasourceAnalyzerAccessDeniedError, + DatasourceAnalyzerDatabaseNotFoundError, + DatasourceAnalyzerInvalidError, + DatasourceAnalyzerSchemaNotFoundError, +) +from superset.datasource_analyzer.schemas import ( + DatasourceAnalyzerPostSchema, + DatasourceAnalyzerResponseSchema, +) +from superset.extensions import event_logger +from superset.views.base_api import BaseSupersetApi, requires_json, statsd_metrics + +logger = logging.getLogger(__name__) + + +class DatasourceAnalyzerRestApi(BaseSupersetApi): + """REST API for datasource analyzer operations.""" + + method_permission_name = MODEL_API_RW_METHOD_PERMISSION_MAP + allow_browser_login = True + class_permission_name = "DatasourceAnalyzer" + resource_name = "datasource_analyzer" + openapi_spec_tag = "Datasource Analyzer" + openapi_spec_component_schemas = ( + DatasourceAnalyzerPostSchema, + DatasourceAnalyzerResponseSchema, + ) + + @expose("/", methods=("POST",)) + @protect() + @safe + @statsd_metrics + @requires_json + @event_logger.log_this_with_context( + action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.post", + log_to_statsd=True, + ) + def post(self) -> Response: + """Initiate a datasource analysis job. + --- + post: + summary: Initiate datasource analysis + description: >- + Initiates a background job to analyze a database schema. + Returns a run_id that can be used to track the job status. + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/DatasourceAnalyzerPostSchema' + responses: + 200: + description: Analysis job initiated successfully + content: + application/json: + schema: + type: object + properties: + result: + $ref: '#/components/schemas/DatasourceAnalyzerResponseSchema' + 400: + $ref: '#/components/responses/400' + 401: + $ref: '#/components/responses/401' + 403: + $ref: '#/components/responses/403' + 404: + $ref: '#/components/responses/404' + 422: + $ref: '#/components/responses/422' + 500: + $ref: '#/components/responses/500' + """ + try: + schema = DatasourceAnalyzerPostSchema() + data = schema.load(request.json) + except ValidationError as error: + return self.response_400(message=error.messages) Review Comment: **Suggestion:** Request payload validation failures are currently returned as HTTP 400, whereas in Superset APIs these semantic validation errors are typically reported as 422, which can break clients that rely on 422 to distinguish invalid payloads from malformed requests. [possible bug] **Severity Level:** Critical 🚨 ```suggestion return self.response_422(message=error.messages) ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The marshmallow ValidationError represents a semantic validation failure of a well-formed request body. The OpenAPI block in this endpoint already lists 422 as a possible response and other errors in this handler map semantic problems to 422 (DatasourceAnalyzerInvalidError -> response_422). Returning 422 for schema validation keeps behavior consistent with the docs and with how clients usually distinguish malformed vs semantically invalid payloads. This is a real behavioral fix, not mere style. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/datasource_analyzer/api.py **Line:** 106:106 **Comment:** *Possible Bug: Request payload validation failures are currently returned as HTTP 400, whereas in Superset APIs these semantic validation errors are typically reported as 422, which can break clients that rely on 422 to distinguish invalid payloads from malformed requests. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/components/ReviewSchemaPanel.tsx: ########## @@ -0,0 +1,331 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +import { useState, useEffect } from 'react'; +import { t } from '@superset-ui/core'; +import { styled, useTheme } from '@apache-superset/core/ui'; +import { Flex, Icons, Typography } from '@superset-ui/core/components'; + +interface ReviewSchemaPanelProps { + databaseName: string | null; + schemaName: string | null; + onAnalysisComplete: () => void; +} + +enum AnalysisStep { + CONNECTING = 0, + SCANNING = 1, + ANALYZING = 2, + AI_INTERPRETATION = 3, + COMPLETE = 4, +} + +const analysisSteps = [ + { + key: AnalysisStep.CONNECTING, + title: t('Connecting to database'), + description: t('Establishing secure connection'), + }, + { + key: AnalysisStep.SCANNING, + title: t('Scanning schema'), + description: t('Discovering tables, views, and relationships'), + }, + { + key: AnalysisStep.ANALYZING, + title: t('Analyzing structure'), + description: t('Identifying primary keys, foreign keys, and data types'), + }, + { + key: AnalysisStep.AI_INTERPRETATION, + title: t('AI interpretation'), + description: t('Generating semantic descriptions for tables and columns'), + }, +]; + +const Spinner = styled.div` + ${({ theme }) => ` + width: ${theme.sizeUnit * 20}px; + height: ${theme.sizeUnit * 20}px; + border: ${theme.sizeUnit}px solid ${theme.colorBgContainer}; + border-top: ${theme.sizeUnit}px solid ${theme.colorPrimary}; + border-radius: 50%; + animation: spin 1s linear infinite; + + @keyframes spin { + 0% { transform: rotate(0deg); } + 100% { transform: rotate(360deg); } + } + `} +`; + +const Subtitle = styled(Typography.Text)` + ${({ theme }) => ` + display: block; + text-align: center; + + .database-name { + color: ${theme.colorPrimary}; + font-weight: ${theme.fontWeightStrong}; + } + `} +`; + +const ProgressBarContainer = styled.div` + ${({ theme }) => ` + width: 100%; + height: ${theme.sizeUnit * 1.5}px; + background-color: ${theme.colorBgContainer}; + border-radius: ${theme.borderRadiusSM}px; + overflow: hidden; + `} +`; + +const ProgressBar = styled.div<{ progress: number }>` + ${({ theme, progress }) => ` + height: 100%; + width: ${progress}%; + background: linear-gradient(90deg, ${theme.colorPrimary} 0%, ${theme.colorSuccess} 100%); + border-radius: ${theme.borderRadiusSM}px; + transition: width 0.5s ease-in-out; + `} +`; + +const StepsContainer = styled.div` + ${({ theme }) => ` + width: 100%; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingLG}px; + `} +`; + +const StepIcon = styled.div<{ isActive: boolean; isCompleted: boolean }>` + ${({ theme, isActive, isCompleted }) => ` + width: ${theme.sizeUnit * 6}px; + height: ${theme.sizeUnit * 6}px; + min-width: ${theme.sizeUnit * 6}px; + border-radius: 50%; + display: flex; + align-items: center; + justify-content: center; + font-size: ${theme.fontSizeSM}px; + + ${ + isCompleted + ? ` + background-color: ${theme.colorSuccess}; + color: white; + ` + : isActive + ? ` + background-color: ${theme.colorPrimary}; + color: white; + ` + : ` + background-color: ${theme.colorBgBase}; + border: 1px solid ${theme.colorBorder}; + color: ${theme.colorTextSecondary}; + ` + } + `} +`; + +const StepTitle = styled.p<{ isActive: boolean; isCompleted: boolean }>` + ${({ theme, isActive, isCompleted }) => ` + font-size: ${theme.fontSize}px; + font-weight: ${theme.fontWeightStrong}; + color: ${isActive || isCompleted ? theme.colorText : theme.colorTextSecondary}; + margin: 0 0 2px 0; + `} +`; + +const StepDescription = styled.p` + ${({ theme }) => ` + font-size: ${theme.fontSizeSM}px; + color: ${theme.colorTextSecondary}; + margin: 0; + `} +`; + +const InfoBanner = styled.div` + ${({ theme }) => ` + width: 100%; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingMD}px ${theme.paddingLG}px; + `} +`; + +const InfoIcon = styled.div` + margin-top: 2px; + flex-shrink: 0; +`; + +const PanelContainer = styled(Flex)` + ${({ theme }) => ` + width: 100%; + max-width: 600px; + gap: ${theme.marginLG}px; + `} +`; + +const TitleSection = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepsListContainer = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepRow = styled(Flex)` + ${({ theme }) => ` + padding: ${theme.paddingSM}px 0; + gap: ${theme.marginSM}px; + `} +`; + +const InfoBannerContent = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepContent = styled(Flex)` + flex: 1; +`; + +export default function ReviewSchemaPanel({ + databaseName, + schemaName: _schemaName, + onAnalysisComplete, +}: ReviewSchemaPanelProps) { + const theme = useTheme(); + const [currentStep, setCurrentStep] = useState<AnalysisStep>( + AnalysisStep.CONNECTING, + ); + + // Simulate the analysis progress + useEffect(() => { + const stepDurations = [1500, 2000, 2500, 3000]; // ms for each step + let timeoutId: NodeJS.Timeout; + + const advanceStep = (step: AnalysisStep) => { + if (step < AnalysisStep.COMPLETE) { + timeoutId = setTimeout(() => { + const nextStep = step + 1; + setCurrentStep(nextStep); + if (nextStep < AnalysisStep.COMPLETE) { + advanceStep(nextStep); + } else { + // Analysis complete - trigger callback after a short delay + setTimeout(() => { + onAnalysisComplete(); + }, 1000); + } + }, stepDurations[step]); + } + }; + + advanceStep(AnalysisStep.CONNECTING); + + return () => { + if (timeoutId) { + clearTimeout(timeoutId); + } + }; + }, [onAnalysisComplete]); + + const progress = ((currentStep + 1) / (analysisSteps.length + 1)) * 100; + + return ( + <PanelContainer vertical align="center"> + <Spinner /> + + <TitleSection vertical align="center"> + <Typography.Title css={{ margin: 0, textAlign: 'center' }} level={3}> + {t('Analyzing database schema')} + </Typography.Title> + <Subtitle type="secondary"> + {t('Connected to')}{' '} + <span className="database-name">{databaseName || 'database'}</span> Review Comment: **Suggestion:** The `schemaName` prop passed into this panel is currently ignored in the UI, so users cannot see which schema is being analyzed, which is misleading in a multi-schema context and wastes a required prop. [logic error] **Severity Level:** Minor ⚠️ ```suggestion <span className="database-name"> {databaseName || 'database'} {_schemaName ? `.${_schemaName}` : ''} </span> ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The PR indeed accepts schemaName (aliased to _schemaName) but doesn't display it — a small UX/logic omission. Appending the schema (when present) to the displayed database name is straightforward, safe, and improves clarity in multi-schema contexts. It's not fixing a runtime bug but it's a valid, actionable improvement tied to the diff. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/components/ReviewSchemaPanel.tsx **Line:** 272:272 **Comment:** *Logic Error: The `schemaName` prop passed into this panel is currently ignored in the UI, so users cannot see which schema is being analyzed, which is misleading in a multi-schema context and wastes a required prop. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset-frontend/src/pages/DatasourceConnector/components/ReviewSchemaPanel.tsx: ########## @@ -0,0 +1,331 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +import { useState, useEffect } from 'react'; +import { t } from '@superset-ui/core'; +import { styled, useTheme } from '@apache-superset/core/ui'; +import { Flex, Icons, Typography } from '@superset-ui/core/components'; + +interface ReviewSchemaPanelProps { + databaseName: string | null; + schemaName: string | null; + onAnalysisComplete: () => void; +} + +enum AnalysisStep { + CONNECTING = 0, + SCANNING = 1, + ANALYZING = 2, + AI_INTERPRETATION = 3, + COMPLETE = 4, +} + +const analysisSteps = [ + { + key: AnalysisStep.CONNECTING, + title: t('Connecting to database'), + description: t('Establishing secure connection'), + }, + { + key: AnalysisStep.SCANNING, + title: t('Scanning schema'), + description: t('Discovering tables, views, and relationships'), + }, + { + key: AnalysisStep.ANALYZING, + title: t('Analyzing structure'), + description: t('Identifying primary keys, foreign keys, and data types'), + }, + { + key: AnalysisStep.AI_INTERPRETATION, + title: t('AI interpretation'), + description: t('Generating semantic descriptions for tables and columns'), + }, +]; + +const Spinner = styled.div` + ${({ theme }) => ` + width: ${theme.sizeUnit * 20}px; + height: ${theme.sizeUnit * 20}px; + border: ${theme.sizeUnit}px solid ${theme.colorBgContainer}; + border-top: ${theme.sizeUnit}px solid ${theme.colorPrimary}; + border-radius: 50%; + animation: spin 1s linear infinite; + + @keyframes spin { + 0% { transform: rotate(0deg); } + 100% { transform: rotate(360deg); } + } + `} +`; + +const Subtitle = styled(Typography.Text)` + ${({ theme }) => ` + display: block; + text-align: center; + + .database-name { + color: ${theme.colorPrimary}; + font-weight: ${theme.fontWeightStrong}; + } + `} +`; + +const ProgressBarContainer = styled.div` + ${({ theme }) => ` + width: 100%; + height: ${theme.sizeUnit * 1.5}px; + background-color: ${theme.colorBgContainer}; + border-radius: ${theme.borderRadiusSM}px; + overflow: hidden; + `} +`; + +const ProgressBar = styled.div<{ progress: number }>` + ${({ theme, progress }) => ` + height: 100%; + width: ${progress}%; + background: linear-gradient(90deg, ${theme.colorPrimary} 0%, ${theme.colorSuccess} 100%); + border-radius: ${theme.borderRadiusSM}px; + transition: width 0.5s ease-in-out; + `} +`; + +const StepsContainer = styled.div` + ${({ theme }) => ` + width: 100%; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingLG}px; + `} +`; + +const StepIcon = styled.div<{ isActive: boolean; isCompleted: boolean }>` + ${({ theme, isActive, isCompleted }) => ` + width: ${theme.sizeUnit * 6}px; + height: ${theme.sizeUnit * 6}px; + min-width: ${theme.sizeUnit * 6}px; + border-radius: 50%; + display: flex; + align-items: center; + justify-content: center; + font-size: ${theme.fontSizeSM}px; + + ${ + isCompleted + ? ` + background-color: ${theme.colorSuccess}; + color: white; + ` + : isActive + ? ` + background-color: ${theme.colorPrimary}; + color: white; + ` + : ` + background-color: ${theme.colorBgBase}; + border: 1px solid ${theme.colorBorder}; + color: ${theme.colorTextSecondary}; + ` + } + `} +`; + +const StepTitle = styled.p<{ isActive: boolean; isCompleted: boolean }>` + ${({ theme, isActive, isCompleted }) => ` + font-size: ${theme.fontSize}px; + font-weight: ${theme.fontWeightStrong}; + color: ${isActive || isCompleted ? theme.colorText : theme.colorTextSecondary}; + margin: 0 0 2px 0; + `} +`; + +const StepDescription = styled.p` + ${({ theme }) => ` + font-size: ${theme.fontSizeSM}px; + color: ${theme.colorTextSecondary}; + margin: 0; + `} +`; + +const InfoBanner = styled.div` + ${({ theme }) => ` + width: 100%; + background-color: ${theme.colorBgContainer}; + border: 1px solid ${theme.colorBorder}; + border-radius: ${theme.borderRadius}px; + padding: ${theme.paddingMD}px ${theme.paddingLG}px; + `} +`; + +const InfoIcon = styled.div` + margin-top: 2px; + flex-shrink: 0; +`; + +const PanelContainer = styled(Flex)` + ${({ theme }) => ` + width: 100%; + max-width: 600px; + gap: ${theme.marginLG}px; + `} +`; + +const TitleSection = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepsListContainer = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepRow = styled(Flex)` + ${({ theme }) => ` + padding: ${theme.paddingSM}px 0; + gap: ${theme.marginSM}px; + `} +`; + +const InfoBannerContent = styled(Flex)` + ${({ theme }) => ` + gap: ${theme.marginSM}px; + `} +`; + +const StepContent = styled(Flex)` + flex: 1; +`; + +export default function ReviewSchemaPanel({ + databaseName, + schemaName: _schemaName, + onAnalysisComplete, +}: ReviewSchemaPanelProps) { + const theme = useTheme(); + const [currentStep, setCurrentStep] = useState<AnalysisStep>( + AnalysisStep.CONNECTING, + ); + + // Simulate the analysis progress + useEffect(() => { + const stepDurations = [1500, 2000, 2500, 3000]; // ms for each step + let timeoutId: NodeJS.Timeout; + + const advanceStep = (step: AnalysisStep) => { + if (step < AnalysisStep.COMPLETE) { + timeoutId = setTimeout(() => { + const nextStep = step + 1; + setCurrentStep(nextStep); + if (nextStep < AnalysisStep.COMPLETE) { + advanceStep(nextStep); + } else { + // Analysis complete - trigger callback after a short delay + setTimeout(() => { + onAnalysisComplete(); + }, 1000); + } + }, stepDurations[step]); + } + }; + + advanceStep(AnalysisStep.CONNECTING); + + return () => { + if (timeoutId) { + clearTimeout(timeoutId); Review Comment: **Suggestion:** The timeout that triggers the completion callback is never cleared on unmount, so `onAnalysisComplete` can still fire after the panel has been removed, potentially causing state updates on an unmounted parent component and leaking timers. [resource leak] **Severity Level:** Minor ⚠️ ```suggestion let completionTimeoutId: NodeJS.Timeout; const advanceStep = (step: AnalysisStep) => { if (step < AnalysisStep.COMPLETE) { timeoutId = setTimeout(() => { const nextStep = step + 1; setCurrentStep(nextStep); if (nextStep < AnalysisStep.COMPLETE) { advanceStep(nextStep); } else { // Analysis complete - trigger callback after a short delay completionTimeoutId = setTimeout(() => { onAnalysisComplete(); }, 1000); } }, stepDurations[step]); } }; advanceStep(AnalysisStep.CONNECTING); return () => { if (timeoutId) { clearTimeout(timeoutId); } if (completionTimeoutId) { clearTimeout(completionTimeoutId); ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The suggestion correctly identifies a real resource-leak risk in the PR hunk: the inner setTimeout that calls onAnalysisComplete() isn't stored, so the cleanup only clears the most recently assigned timeoutId and won't cancel the completion delay. Adding a tracked completionTimeoutId and clearing it on unmount fixes that. Also consider that timeoutId is reassigned as recursion progresses (only one active step timer at a time), so tracking both timers is sufficient here. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset-frontend/src/pages/DatasourceConnector/components/ReviewSchemaPanel.tsx **Line:** 233:255 **Comment:** *Resource Leak: The timeout that triggers the completion callback is never cleared on unmount, so `onAnalysisComplete` can still fire after the panel has been removed, potentially causing state updates on an unmounted parent component and leaking timers. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/datasource_analyzer/commands/initiate.py: ########## @@ -0,0 +1,110 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import logging +import uuid +from typing import Any, Optional + +from superset.commands.base import BaseCommand +from superset.daos.database import DatabaseDAO +from superset.datasource_analyzer.exceptions import ( + DatasourceAnalyzerAccessDeniedError, + DatasourceAnalyzerDatabaseNotFoundError, + DatasourceAnalyzerInvalidError, + DatasourceAnalyzerSchemaNotFoundError, +) +from superset.extensions import security_manager + +logger = logging.getLogger(__name__) + + +class InitiateDatasourceAnalyzerCommand(BaseCommand): + """ + Command to initiate a datasource analysis job. + + This command validates the database and schema, checks user access, + and initiates an analysis job (placeholder for Celery task). + """ + + def __init__( + self, + database_id: int, + schema_name: str, + catalog_name: Optional[str] = None, + ): + self._database_id = database_id + self._schema_name = schema_name + self._catalog_name = catalog_name + self._database = None + + def run(self) -> dict[str, Any]: + """ + Execute the command to initiate datasource analysis. + + :returns: Dictionary containing the run_id + :raises: DatasourceAnalyzerInvalidError if validation fails + """ + self.validate() + + # Generate a unique run_id for this analysis job + run_id = str(uuid.uuid4()) + + # TODO: Enqueue Celery task for actual schema introspection + # celery_task.delay(run_id, self._database_id, self._schema_name, self._catalog_name) + + logger.info( + "Initiated datasource analysis job %s for database %s, schema %s", + run_id, + self._database_id, + self._schema_name, + ) + + return {"run_id": run_id} + + def validate(self) -> None: + """ + Validate the command parameters. + + :raises: DatasourceAnalyzerDatabaseNotFoundError if database doesn't exist + :raises: DatasourceAnalyzerAccessDeniedError if user doesn't have access + :raises: DatasourceAnalyzerSchemaNotFoundError if schema doesn't exist + :raises: DatasourceAnalyzerInvalidError for other validation failures + """ + exceptions: list[Exception] = [] + + # Verify database exists + self._database = DatabaseDAO.find_by_id(self._database_id) + if not self._database: + raise DatasourceAnalyzerDatabaseNotFoundError() + + # Verify user has access to the database + if not security_manager.can_access_database(self._database): + raise DatasourceAnalyzerAccessDeniedError() + + # Verify schema exists in the database + try: + schemas = self._database.get_all_schema_names( + catalog=self._catalog_name, + cache=False, + ) + if self._schema_name not in schemas: + raise DatasourceAnalyzerSchemaNotFoundError() Review Comment: **Suggestion:** The schema-not-found condition is raised inside the `try` block but immediately caught by the broad `except Exception` handler and converted into a generic `DatasourceAnalyzerInvalidError`, so the API layer never sees `DatasourceAnalyzerSchemaNotFoundError` and will return a 422 instead of the intended 400 response for missing schemas. [logic error] **Severity Level:** Minor ⚠️ ```suggestion # propagate specific schema-not-found error so API can map it correctly raise DatasourceAnalyzerSchemaNotFoundError() except DatasourceAnalyzerSchemaNotFoundError: # re-raise to be handled explicitly by the API layer raise ``` <details> <summary><b>Why it matters? ⭐ </b></summary> The suggestion correctly identifies a real logic/exception-handling problem: raising DatasourceAnalyzerSchemaNotFoundError inside the broad try will be caught by the general except and turned into a generic failure (appended to exceptions) and ultimately a DatasourceAnalyzerInvalidError. That changes the visible error mapping (likely a 422) instead of allowing the specific SchemaNotFound exception to propagate (mapped to the intended 400). Re-raising DatasourceAnalyzerSchemaNotFoundError (or handling it separately) is the correct fix so the API layer receives the intended exception type and status code. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/datasource_analyzer/commands/initiate.py **Line:** 104:104 **Comment:** *Logic Error: The schema-not-found condition is raised inside the `try` block but immediately caught by the broad `except Exception` handler and converted into a generic `DatasourceAnalyzerInvalidError`, so the API layer never sees `DatasourceAnalyzerSchemaNotFoundError` and will return a 422 instead of the intended 400 response for missing schemas. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/datasource_analyzer/schemas.py: ########## @@ -0,0 +1,43 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from marshmallow import fields, Schema + + +class DatasourceAnalyzerPostSchema(Schema): + """Schema for POST /api/v1/datasource_analyzer/""" + + database_id = fields.Integer( + required=True, + metadata={"description": "The ID of the database connection"}, + ) + schema_name = fields.String( + required=True, + metadata={"description": "The name of the schema to analyze"}, + ) + catalog_name = fields.String( + required=False, + load_default=None, Review Comment: **Suggestion:** The optional `catalog_name` field is configured with `load_default=None` but does not allow `None` values, so a client sending `"catalog_name": null` (a common pattern for optional JSON fields) will fail schema validation with a 400 error even though the command logic accepts `None`; allowing `None` aligns validation with the intended contract and backend behavior. [logic error] **Severity Level:** Minor ⚠️ ```suggestion allow_none=True, ``` <details> <summary><b>Why it matters? ⭐ </b></summary> Marshmallow's fields.String does not accept JSON null by default (allow_none defaults to False). The schema already sets load_default=None, which signals the intent that absence should map to None. If a client sends "catalog_name": null today, validation will fail even though the backend logic likely accepts None. Adding allow_none=True fixes a real validation bug and aligns schema validation with the intended contract. </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/datasource_analyzer/schemas.py **Line:** 33:33 **Comment:** *Logic Error: The optional `catalog_name` field is configured with `load_default=None` but does not allow `None` values, so a client sending `"catalog_name": null` (a common pattern for optional JSON fields) will fail schema validation with a 400 error even though the command logic accepts `None`; allowing `None` aligns validation with the intended contract and backend behavior. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
