[GitHub] [superset] villebro commented on a change in pull request #15279: feat: run extra query on QueryObject and add compare operator for post_processing

GitBox Wed, 23 Jun 2021 02:45:59 -0700


villebro commented on a change in pull request #15279:
URL: https://github.com/apache/superset/pull/15279#discussion_r656928933




##########
File path: superset/common/query_context.py
##########
@@ -97,6 +101,62 @@ def __init__(  # pylint: disable=too-many-arguments
             "result_format": self.result_format,
         }
 
+    def processing_time_offset(
+        self, df: pd.DataFrame, query_object: QueryObject,
+    ) -> Tuple[pd.DataFrame, List[str]]:
+        # ensure query_object is immutable
+        query_object_clone = copy.copy(query_object)
+        rv_sql = []
+
+        time_offset = query_object.time_offset
+        outer_from_dttm = query_object.from_dttm
+        outer_to_dttm = query_object.to_dttm
+        for offset in time_offset:
+            try:
+                query_object_clone.from_dttm = get_past_or_future(
+                    offset, outer_from_dttm,
+                )
+                query_object_clone.to_dttm = get_past_or_future(offset, 
outer_to_dttm,)
+            except ValueError as ex:
+                raise QueryObjectValidationError(str(ex))
+            # make sure subquery use main query where clause
+            query_object_clone.inner_from_dttm = outer_from_dttm
+            query_object_clone.inner_to_dttm = outer_to_dttm
+            query_object_clone.time_offset = []

Review comment:
       I wonder if we should add `time_offset` to the `QueryObject` schema and 
rename the current one to `time_offsets`, adding both to the cache key. Example:
   
   We want to make a query with two offsets: one year ago and two years ago.
   
   The "actual" main query that gets executed and cached (no additional columns 
added yet):
   ```python
   time_offsets: None
   time_offset: None
   ```
   
   First extra query (gets concatenated to the previous dataframe):
   ```python
   time_offsets: None
   time_offset: 1
   ```
   
   Second extra query (also concatenated to the main dataframe):
   ```python
   time_offsets: None
   time_offset: 2
   ```
   
   Finally, when the full query object is constructed, the following result 
would be cached with the following keys:
   
   ```python
   time_offsets: [1, 2]
   time_offset: None
   ```
   
   This way the main query result would be persisted along with the results of 
the extra query results without the need to rebuild the full dataframe on each 
request, and the extra queries could then also be cached individually.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] villebro commented on a change in pull request #15279: feat: run extra query on QueryObject and add compare operator for post_processing

Reply via email to