dstandish edited a comment on pull request #16821:
URL: https://github.com/apache/airflow/pull/16821#issuecomment-875262841


   > Let me offer a recommendation.
   > 
   > Rather than adding `start_after_datetime` and `to_datetime`, just add a 
single param called `object_filter` that takes a function `(obj: dict) -> bool`
   > 
   > then this function can be applied like so:
   > 
   > ```python
   >         for page in response:
   >             if 'Contents' in page:
   >                 for k in page['Contents']:
   >                     <evaluate object filter here and only append 
contditionally>
   >                     keys.append(k['Key'])
   > ```
   > 
   > with obj filter people can apply whatevery logic they want and they don't 
have to understand what start_from_datetime / to_datetime mean and how to use 
(since they aren't part of the s3 api)
   > 
   > what do yall think
   
   Maybe this isn't the greatest idea...  I thought there were more attributes 
available in the response.
   
   Here's what you get in the response:
   
   ```
   {'Key': 'tmp/hello/1.txt',
    'LastModified': datetime.datetime(2021, 7, 6, 3, 13, 41, tzinfo=tzutc()),
    'ETag': '"49f68a5c8493ec2c0bf489821c21fc3b"',
    'Size': 2,
    'StorageClass': 'STANDARD'}
   ```
   
   So I guess there isn't much else in that response that you'd want to filter 
on.
   
   Still, putting the filtering in the user's hands (through the use of an 
`object_filter` callable would remove the ambiguity of strictly greater than vs 
greater than (and same with less than)?
   
   Also there's the problem of naming.
   
   If we include both parameters `from` and `to`, we should probably call them 
`from_datetime` and `to_datetime`, where `from` is `>=` and `to` is `<`.  
   
   One other thing.... I am not sure why we use the jmespath when we already 
have the datetime object available in the for loop in the existing code.  I 
think it is best to compare `LastModified` directly with the user supplied 
datetime rather than converting everything to string and doing a json search.  
For one this is additional transformation that inherently could introduce an 
error through formatting difference.  I also suspect the json search is 
significantly less efficient but not sure.  Bottom line, if you already have 
datetime on both sides of the comparison, why convert them both to string to 
compare them?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to