[ 
https://issues.apache.org/jira/browse/BEAM-14314?focusedWorklogId=765142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765142
 ]

ASF GitHub Bot logged work on BEAM-14314:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/May/22 21:04
            Start Date: 02/May/22 21:04
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r863189335


##########
sdks/python/apache_beam/io/azure/blobstorageio.py:
##########
@@ -559,40 +569,54 @@ def _delete_batch(self, container, blobs):
 
   @retry.with_exponential_backoff(
       retry_filter=retry.retry_on_beam_io_error_filter)
-  def list_prefix(self, path):
+  def list_prefix(self, path, with_metadata=False):
     """Lists files matching the prefix.
 
     Args:
       path: Azure Blob Storage file path pattern in the form
             azfs://<storage-account>/<container>/[name].
+      with_metadata: Experimental. Specify whether returns file metadata.
 
     Returns:
-      Dictionary of file name -> size.
+      If ``with_metadata`` is False: dict of file name -> size; if
+        ``with_metadata`` is True: dict of file name -> tuple(size, timestamp).
     """
     storage_account, container, blob = parse_azfs_path(
         path, blob_optional=True, get_account=True)
-    file_sizes = {}
+    file_info = {}
     counter = 0
     start_time = time.time()
 
-    logging.info("Starting the size estimation of the input")
+    if with_metadata:
+      logging.info("Starting the file information of the input")
+    else:
+      logging.info("Starting the size estimation of the input")
     container_client = self.client.get_container_client(container)
 
     while True:
       response = container_client.list_blobs(name_starts_with=blob)
       for item in response:
         file_name = "azfs://%s/%s/%s" % (storage_account, container, item.name)
-        file_sizes[file_name] = item.size
+        if with_metadata:
+          file_info[file_name] = (
+              item.size, self._updated_to_seconds(item.last_modified))

Review Comment:
   to be honest, I might prefer that we change this to always be a namedtuple 
as a return value, instead of a Tuple-or-value depending on the arguments.
   
   We don't need to change this now, as it's not officially a public API, but 
in the future it may be worth doing to clean up - maybe add a JIRA issue with a 
target version for Beam 3.0.0?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 765142)
    Time Spent: 4h 40m  (was: 4.5h)

> Add last_updated field in filesystem.FileMetaData
> -------------------------------------------------
>
>                 Key: BEAM-14314
>                 URL: https://issues.apache.org/jira/browse/BEAM-14314
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-common
>            Reporter: Yi Hu
>            Assignee: Yi Hu
>            Priority: P2
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This will be the python counterpart of BEAM-5910
> Per python naming convention, the field will be named as 
> "last_updated_in_seconds".



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to