dstandish commented on a change in pull request #16821:
URL: https://github.com/apache/airflow/pull/16821#discussion_r668288116
##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -292,18 +302,24 @@ def list_keys(
'PageSize': page_size,
'MaxItems': max_items,
}
-
paginator = self.get_conn().get_paginator('list_objects_v2')
response = paginator.paginate(
- Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter,
PaginationConfig=config
+ Bucket=bucket_name,
+ Prefix=prefix,
+ Delimiter=delimiter,
+ PaginationConfig=config,
+ StartAfter=start_after_key,
+ )
+ # JMESPath to query directly on paginated results
+ filtered_response = response.search(
+ "Contents[?to_string("
+ "LastModified)<='\"{}\"' && "
+ "to_string(LastModified)>='\"{"
Review comment:
i'm not sure adding this ResponseFilter is the right approach. i am not
sure we need to add this abstraction layer.
Why not just let the user put an arbitrary callable that does the filtering
that the user needs?
For example, your docstring shows this example:
```python
object_filter={"LastModified__lt": datetime.now()},
```
But with arbitrary callable, this can be implemented about as simply this
way:
```python
object_filter=lambda x: x['LastModified'] < datetime.now(),
```
This is roughly as compact and roughly as simple, but the benefit is there's
no need to understand the options available within the class and how it works,
and if user wants to do arbitrary filtering they can.
---
That said, I still think it's probably best to go with your original
thought, i.e. `from_datetime` and `to_datetime` (though with the updated naming
/ logic).
_(I know I initially suggested adding object filter callable, but
subsequently I walked back that suggestion; since there aren't other useful
filter attributes in the object metadata, it's not as useful)_.
And your approach of adding only a certain number of allowable operations
doesn't really provide the flexibility of the callable.
Simply adding the two optional datetime params, from and to, is simple, easy
to understand, useful. It's true that with simple from / to params, we can't
provide flexibility with interval edge inclusivity, but to me that seems like
an acceptable tradeoff for the simplicity and usabilitiy. If we go with this
approach I would probably vote for doing `>=` for `from` and `<` for `to`.
But if _not_ going with simple from / to, I'm in support of simple arbitrary
callable instead of the ResponseFilter apparatus.
What do yall think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]