dstandish commented on a change in pull request #16821:
URL: https://github.com/apache/airflow/pull/16821#discussion_r668288116



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -292,18 +302,24 @@ def list_keys(
             'PageSize': page_size,
             'MaxItems': max_items,
         }
-
         paginator = self.get_conn().get_paginator('list_objects_v2')
         response = paginator.paginate(
-            Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter, 
PaginationConfig=config
+            Bucket=bucket_name,
+            Prefix=prefix,
+            Delimiter=delimiter,
+            PaginationConfig=config,
+            StartAfter=start_after_key,
+        )
+        # JMESPath to query directly on paginated results
+        filtered_response = response.search(
+            "Contents[?to_string("
+            "LastModified)<='\"{}\"' && "
+            "to_string(LastModified)>='\"{"

Review comment:
       i'm not sure adding this ResponseFilter is the right approach.  i am not 
sure we need to add this abstraction layer.
   
   Why not just let the user put an arbitrary callable that does the filtering 
that the user needs?  
   
   For example, your docstring shows this example:
   
   ```python
   object_filter={"LastModified__lt": datetime.now()},
   ```
   
   But with arbitrary callable, this can be implemented about as simply this 
way:
   
   ```python
   object_filter=lambda x: x['LastModified'] < datetime.now(),
   ```
   
   This is roughly as compact and roughly as simple, but the benefit is there's 
no need to understand the options available within the class and how it works, 
and if user wants to do arbitrary filtering they can.
   
   ---
   
   That said, I still think it's probably best to go with your original 
thought, i.e. `from_datetime` and `to_datetime` (though with the updated naming 
/ logic).  
   
   _(I know I initially suggested adding object filter callable, but 
subsequently I walked back that suggestion; since there aren't other useful 
filter attributes in the object metadata, it's not as useful)_.
   
   And your approach of adding only a certain number of allowable operations 
doesn't really provide the flexibility of the callable.
   
   Simply adding the two optional datetime params, from and to, is simple, easy 
to understand, useful.  It's true that with simple from / to params, we can't 
provide flexibility with interval edge inclusivity, but to me that seems like 
an acceptable tradeoff for the simplicity and usabilitiy.  If we go with this 
approach I would probably vote for doing `>=` for `from` and `<` for `to`.
   
   ---
   
   In conclusion i think going with `from` and `to` seems best, but if not that 
then simple arbitrary callable instead of the ResponseFilter apparatus seems 
best.  Interested in what others think.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to