Re: [PR] chore: Optimize fetching samples logic [superset]

via GitHub Sun, 19 Nov 2023 13:42:14 -0800


john-bodley commented on code in PR #25995:
URL: https://github.com/apache/superset/pull/25995#discussion_r1398499898



##########
superset/views/datasource/utils.py:
##########
@@ -104,21 +104,18 @@ def get_samples(  # pylint: 
disable=too-many-arguments,too-many-locals
         result_type=ChartDataResultType.FULL,
         force=force,
     )
-    samples_results = samples_instance.get_payload()
-    count_star_results = count_star_instance.get_payload()
 
     try:
-        sample_data = samples_results["queries"][0]
-        count_star_data = count_star_results["queries"][0]
-        failed_status = (
-            sample_data.get("status") == QueryStatus.FAILED
-            or count_star_data.get("status") == QueryStatus.FAILED
-        )
-        error_msg = sample_data.get("error") or count_star_data.get("error")
-        if failed_status and error_msg:
-            cache_key = sample_data.get("cache_key")
-            QueryCacheManager.delete(cache_key, region=CacheRegion.DATA)
-            raise DatasetSamplesFailedError(error_msg)
+        count_star_data = count_star_instance.get_payload()["queries"][0]
+
+        if count_star_data.get("status") == QueryStatus.FAILED:
+            raise DatasetSamplesFailedError(count_star_data.get("error"))
+
+        sample_data = samples_instance.get_payload()["queries"][0]
+
+        if sample_data.get("status") == QueryStatus.FAILED:
+            QueryCacheManager.delete(sample_data.get("cache_key"), 
CacheRegion.DATA)

Review Comment:
   Thanks @zhaoyongjie for the comment. Thinking about this more, if a query 
fails then there’s shouldn’t be anything cached. Hence, in the new sequential 
formulation, if the `COUNT(*)` query succeeds but the sample data query fails 
we should actually be removing the cached result from the former.



##########
superset/views/datasource/utils.py:
##########
@@ -104,21 +104,18 @@ def get_samples(  # pylint: 
disable=too-many-arguments,too-many-locals
         result_type=ChartDataResultType.FULL,
         force=force,
     )
-    samples_results = samples_instance.get_payload()
-    count_star_results = count_star_instance.get_payload()
 
     try:
-        sample_data = samples_results["queries"][0]
-        count_star_data = count_star_results["queries"][0]
-        failed_status = (
-            sample_data.get("status") == QueryStatus.FAILED
-            or count_star_data.get("status") == QueryStatus.FAILED
-        )
-        error_msg = sample_data.get("error") or count_star_data.get("error")
-        if failed_status and error_msg:
-            cache_key = sample_data.get("cache_key")
-            QueryCacheManager.delete(cache_key, region=CacheRegion.DATA)
-            raise DatasetSamplesFailedError(error_msg)
+        count_star_data = count_star_instance.get_payload()["queries"][0]
+
+        if count_star_data.get("status") == QueryStatus.FAILED:
+            raise DatasetSamplesFailedError(count_star_data.get("error"))
+
+        sample_data = samples_instance.get_payload()["queries"][0]
+
+        if sample_data.get("status") == QueryStatus.FAILED:
+            QueryCacheManager.delete(sample_data.get("cache_key"), 
CacheRegion.DATA)

Review Comment:
   Thanks @zhaoyongjie for the comment. Thinking about this more, if a query 
fails then there’s shouldn’t be anything cached. Hence, in the new sequential 
formulation, if the `COUNT(*)` query succeeds but the sample data query fails 
we should actually be removing the cached result from the former.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Optimize fetching samples logic [superset]

Reply via email to