dosubot[bot] commented on issue #30377: URL: https://github.com/apache/superset/issues/30377#issuecomment-2371761128
<!-- Greeting --> Hey @luizcapu! 👋 I'm here to help with bugs, answer questions, and assist you while you wait for a human maintainer. Let's squash those bugs together! <!-- Answer --> The `get_or_create` Dataset API endpoint returns a `500 - Internal Server Error` when trying to create a dataset with the same `table_name` but a different `schema` because the current implementation only checks for the existence of a table by its `table_name` and `database_id`. It does not account for differences in the `schema`. If a table with the same `table_name` already exists, it returns the existing table's ID without considering the schema, leading to potential conflicts or errors <sup>[[1]](https://github.com/apache/superset/blob/master/superset/datasets/api.py)</sup>. The relevant code snippet is: ```python @expose("/get_or_create/", methods=("POST",)) @protect() @safe @statsd_metrics @event_logger.log_this_with_context( action=lambda self, *args, **kwargs: f"{self.__class__.__name__}" f".get_or_create_dataset", log_to_statsd=False, ) def get_or_create_dataset(self) -> Response: """Retrieve a dataset by name, or create it if it does not exist. --- post: summary: Retrieve a table by name, or create it if it does not exist requestBody: required: true content: application/json: schema: $ref: '#/components/schemas/GetOrCreateDatasetSchema' responses: 200: description: The ID of the table content: application/json: schema: type: object properties: result: type: object properties: table_id: type: integer 400: $ref: '#/components/responses/400' 401: $ref: '#/components/responses/401' 422: $ref: '#/components/responses/422' 500: $ref: '#/components/responses/500' """ try: body = GetOrCreateDatasetSchema().load(request.json) except ValidationError as ex: return self.response(400, message=ex.messages) table_name = body["table_name"] database_id = body["database_id"] if table := DatasetDAO.get_table_by_name(database_id, table_name): return self.response(200, result={"table_id": table.id}) ``` The code only checks for the existence of a table by `table_name` and `database_id`: ```python if table := DatasetDAO.get_table_by_name(database_id, table_name): return self.response(200, result={"table_id": table.id}) ``` This logic does not consider the `schema`, which can lead to a `500 - Internal Server Error` if there is a conflict or inconsistency when trying to create a dataset with the same `table_name` but a different `schema` <sup>[[1]](https://github.com/apache/superset/blob/master/superset/datasets/api.py)</sup>. Additionally, the test case `test_create_dataset_same_name_different_schema` in the integration tests demonstrates that creating datasets with the same `table_name` but different `schema` is a valid scenario, which the current implementation does not handle correctly <sup>[[2]](https://github.com/apache/superset/blob/master/tests/integration_tests/datasets/api_tests.py)</sup>. <!-- Dosu Comment Footer --> *To continue talking to [Dosu](https://dosu.dev), mention @dosu.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
