dstandish commented on a change in pull request #16821:
URL: https://github.com/apache/airflow/pull/16821#discussion_r668288116
##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -292,18 +302,24 @@ def list_keys(
'PageSize': page_size,
'MaxItems': max_items,
}
-
paginator = self.get_conn().get_paginator('list_objects_v2')
response = paginator.paginate(
- Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter,
PaginationConfig=config
+ Bucket=bucket_name,
+ Prefix=prefix,
+ Delimiter=delimiter,
+ PaginationConfig=config,
+ StartAfter=start_after_key,
+ )
+ # JMESPath to query directly on paginated results
+ filtered_response = response.search(
+ "Contents[?to_string("
+ "LastModified)<='\"{}\"' && "
+ "to_string(LastModified)>='\"{"
Review comment:
i'm not sure adding this ResponseFilter is the right approach. i am not
sure we need to add this abstraction layer.
Why not just let the user put an arbitrary callable that does the filtering
that the user needs?
For example, your docstring shows this example:
```python
object_filter={"LastModified__lt": datetime.now()},
```
But with arbitrary callable, this can be implemented about as simply this
way:
```python
object_filter=lambda x: x['LastModified'] < datetime.now(),
```
This is roughly as compact and roughly as simple, but the benefit is there's
no need to understand the options available within the class and how it works,
and if user wants to do arbitrary filtering they can.
---
That said, I still think it's probably best to go with your original
thought, i.e. `from_datetime` and `to_datetime` (though with the updated naming
/ logic).
_(I know I initially suggested adding object filter callable, but
subsequently I walked back that suggestion; since there aren't other useful
filter attributes in the object metadata, it's not as useful)_.
And your approach of adding only a certain number of allowable operations
doesn't really provide the flexibility of the callable.
Simply adding the two optional datetime params, from and to, is simple, easy
to understand, useful. It's true that with simple from / to params, we can't
provide flexibility with interval edge inclusivity, but to me that seems like
an acceptable tradeoff for the simplicity and usabilitiy. If we go with this
approach I would probably vote for doing `>=` for `from` and `<` for `to`.
---
In conclusion i think going with `from` and `to` seems best, but if not that
then simple arbitrary callable instead of the ResponseFilter apparatus seems
best. Interested in what others think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]