dstandish commented on a change in pull request #16821:
URL: https://github.com/apache/airflow/pull/16821#discussion_r664912477



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -292,18 +302,24 @@ def list_keys(
             'PageSize': page_size,
             'MaxItems': max_items,
         }
-
         paginator = self.get_conn().get_paginator('list_objects_v2')
         response = paginator.paginate(
-            Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter, 
PaginationConfig=config
+            Bucket=bucket_name,
+            Prefix=prefix,
+            Delimiter=delimiter,
+            PaginationConfig=config,
+            StartAfter=start_after_key,
+        )
+        # JMESPath to query directly on paginated results
+        filtered_response = response.search(

Review comment:
       why are you filtering even when start after is `None`?

##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -283,6 +287,12 @@ def list_keys(
         :type page_size: int
         :param max_items: maximum items to return
         :type max_items: int
+        :param start_after_key: returns keys after this specified key in the 
bucket.
+        :type start_after_key: str
+        :param start_after_datetime: returns keys with last modified datetime 
greater than the specified datetime.
+        :type start_after_datetime: datetime , ISO8601: '%Y-%m-%dT%H:%M:%S%z', 
e.g. 2021-02-20T05:20:20+0000

Review comment:
       if the type is `datetime` then why do you need to specify string 
formatting here?  it makes it look like the type is actually string

##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -269,6 +270,9 @@ def list_keys(
         delimiter: Optional[str] = None,
         page_size: Optional[int] = None,
         max_items: Optional[int] = None,
+        start_after_key: Optional[str] = '',

Review comment:
       I think `None` is the standard default for optional string.  I think 
this is the most intuitive approach, compared with empty string.
   
   Then you can you can add the parameter only conditionally to `paginate`.  To 
me it makes sense to only specify parameters when you actually want to use 
them.  No me `None` is a good default to mean, do not use this param (unless 
supplying `StartAfter=None` in paginate has the same effect as omitting it, in 
which case leaving it in is fine).

##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -292,18 +302,24 @@ def list_keys(
             'PageSize': page_size,
             'MaxItems': max_items,
         }
-
         paginator = self.get_conn().get_paginator('list_objects_v2')
         response = paginator.paginate(
-            Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter, 
PaginationConfig=config
+            Bucket=bucket_name,
+            Prefix=prefix,
+            Delimiter=delimiter,
+            PaginationConfig=config,
+            StartAfter=start_after_key,
+        )
+        # JMESPath to query directly on paginated results
+        filtered_response = response.search(
+            "Contents[?to_string("
+            "LastModified)<='\"{}\"' && "
+            "to_string(LastModified)>='\"{"

Review comment:
       if it's going to be called `start_after_datetime` then this should be 
strictly greater than, i think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to