dstandish commented on issue #16627:
URL: https://github.com/apache/airflow/issues/16627#issuecomment-873521269
I did this for our internal repo and what I did **was refactor list_keys to
call a list_objects method** so you could get the full objects and filter after:
```python
@provide_bucket_name
def list_objects(
self,
bucket_name: Optional[str] = None,
prefix: Optional[str] = None,
delimiter: Optional[str] = None,
page_size: Optional[int] = None,
max_items: Optional[int] = None,
start_after_key: Optional[str] = None,
start_after_time: Optional['DateTime'] = None,
) -> List[S3Object]:
"""
Lists keys in a bucket under prefix and not containing delimiter
Args:
bucket_name: the name of the bucket
prefix: a key prefix
delimiter: the delimiter marks key hierarchy.
page_size: pagination size
max_items: maximum items to return
start_after_key: should return only keys greater than this key
start_after_time: should return only keys with LastModified attr
greater than this time
```
this lets you use either start after key (which is supported by
list_objects_v2) or start after time (which is what you're after, and which
requires that we list out every file in the prefix).
and if people want to use other object info for filtering it would be easy
to do.
I think that might not be a bad way to go here.
then list keys somehting like this:
```python
@provide_bucket_name
def list_keys(
self,
bucket_name: Optional[str] = None,
prefix: Optional[str] = None,
delimiter: Optional[str] = None,
page_size: Optional[int] = None,
max_items: Optional[int] = None,
start_after_key: Optional[str] = None,
start_after_time: Optional['DateTime'] = None,
) -> list:
objects = self.list_objects(
bucket_name=bucket_name,
prefix=prefix,
delimiter=delimiter,
page_size=page_size,
max_items=max_items,
start_after_key=start_after_key,
start_after_time=start_after_time,
)
return [o.Key for o in objects]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]