[GitHub] [superset] kzosabe commented on a change in pull request #18694: feat(bigquery): add support for query cost estimate

GitBox Mon, 14 Feb 2022 03:36:48 -0800


kzosabe commented on a change in pull request #18694:
URL: https://github.com/apache/superset/pull/18694#discussion_r805754941




##########
File path: superset/db_engine_specs/bigquery.py
##########
@@ -185,6 +185,60 @@ class BigQueryEngineSpec(BaseEngineSpec):
         ),
     }
 
+    @classmethod
+    def get_allow_cost_estimate(cls, extra: Dict[str, Any]) -> bool:
+        return True
+
+    @classmethod
+    def estimate_statement_cost(
+        cls, statement: str, cursor: Any, engine: Engine
+    ) -> Dict[str, Any]:
+        try:
+            # pylint: disable=import-outside-toplevel
+            from google.cloud import bigquery
+            from google.oauth2 import service_account
+        except ImportError as ex:
+            raise Exception(
+                "Could not import libraries `google.cloud` or `google.oauth2`, 
"
+                "which are required to be installed in your environment in 
order "
+                "to estimate cost"
+            ) from ex
+
+        creds = engine.dialect.credentials_info
+        credentials = 
service_account.Credentials.from_service_account_info(creds)
+        client = bigquery.Client(credentials=credentials)
+        dry_run_result = client.query(
+            statement, bigquery.job.QueryJobConfig(dry_run=True)
+        )
+
+        return {
+            "Total bytes processed": dry_run_result.total_bytes_processed,
+        }
+
+    @classmethod
+    def query_cost_formatter(
+        cls, raw_cost: List[Dict[str, Any]]
+    ) -> List[Dict[str, str]]:
+        def format_bytes_str(raw_bytes: int) -> str:
+            if not isinstance(raw_bytes, int):
+                return str(raw_bytes)
+            units = ["B", "KiB", "MiB", "GiB", "TiB", "PiB"]
+            index = 0
+            bytes = float(raw_bytes)
+            while bytes >= 1024 and index < len(units) - 1:
+                bytes /= 1024
+                index += 1
+
+            return "{:.1f}".format(bytes) + f" {units[index]}"
+
+        return [
+            {
+                k: format_bytes_str(v) if k == "Total bytes processed" else 
str(v)
+                for k, v in row.items()
+            }
+            for row in raw_cost
+        ]

Review comment:
       Thanks for the review!
   
   > so we could remove the duplication?
   
   It is possible that it is better to go DRY, but there are some things to 
consider somewhat.
   
   The intent of this implementation was to be consistent with the official UI 
provided by BigQuery, both in KiB notation and to the first decimal place.
   <img width="286" alt="スクリーンショット 2022-02-14 20 32 57" 
src="https://user-images.githubusercontent.com/1046476/153856821-3b8f00af-0f99-467b-afac-da850013e04f.png";>
   
   
   In particular, the current presto and trino implementations divide by 1000 
instead of 1024, which is a problem. 
   There will be a small difference between the number of predicted bytes in 
BigQuery and the number of predicted bytes in superset.
   I would like to avoid using the current humanize implementation as is, 
because this would cause confusion for users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] kzosabe commented on a change in pull request #18694: feat(bigquery): add support for query cost estimate

Reply via email to