[ 
https://issues.apache.org/jira/browse/BEAM-14314?focusedWorklogId=758051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758051
 ]

ASF GitHub Bot logged work on BEAM-14314:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Apr/22 18:22
            Start Date: 18/Apr/22 18:22
    Worklog Time Spent: 10m 
      Work Description: johnjcasey commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r852311644


##########
sdks/python/apache_beam/io/azure/blobstorageio.py:
##########
@@ -372,29 +355,47 @@ def last_updated(self, path):
     Returns: last updated time of the Azure Blob Storage blob
     in seconds.
     """
-    container, blob = parse_azfs_path(path)
-    blob_to_check = self.client.get_blob_client(container, blob)
-    try:
-      properties = blob_to_check.get_blob_properties()
-    except ResourceNotFoundError as e:
-      message = e.reason
-      code = e.status_code
-      raise BlobStorageError(message, code)
+    return self._updated_to_seconds(self._blob_properties(path).last_modified)
 
-    datatime = properties.last_modified
-    return (
-        time.mktime(datatime.timetuple()) - time.timezone +
-        datatime.microsecond / 1000000.0)
-
-  @retry.with_exponential_backoff(
-      retry_filter=retry.retry_on_beam_io_error_filter)
   def checksum(self, path):
     """Looks up the checksum of an Azure Blob Storage blob.
 
     Args:
       path: Azure Blob Storage file path pattern in the form
             azfs://<storage-account>/<container>/[name].
     """
+    return self._blob_properties(path).properties.etag
+
+  def _vars(self, path):

Review Comment:
   can we improve the name here too?



##########
sdks/python/apache_beam/io/filesystem.py:
##########
@@ -421,27 +421,44 @@ def __exit__(self, exception_type, exception_value, 
traceback):
 
 
 class FileMetadata(object):
-  """Metadata about a file path that is the output of FileSystem.match."""
-  def __init__(self, path, size_in_bytes):
+  """Metadata about a file path that is the output of FileSystem.match.
+
+  Fields:
+    path: [Required] file path.
+    size_in_bytes: [Required] file size in bytes.
+    last_updated_in_seconds: [Optional] last modified timestamp of the file, or
+    valued 0.0 if not specified.

Review Comment:
   Does a timestamp in seconds match what we have in java? I would expect it to 
be millis since epoch generally



##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -532,39 +510,83 @@ def last_updated(self, path):
 
     Returns: last updated time of the GCS object in second.
     """
+    return self._updated_to_seconds(self._gcs_object(path).updated)
+
+  def _vars(self, path):

Review Comment:
   I may be missing context, but _vars doesn't tell me much about what this 
method does. Can we get a better name here?



##########
sdks/python/apache_beam/io/aws/s3io.py:
##########
@@ -464,6 +459,47 @@ def exists(self, path):
         # We re-raise all other exceptions
         raise
 
+  def _vars(self, path):

Review Comment:
   same here



##########
sdks/python/apache_beam/io/hadoopfilesystem.py:
##########
@@ -399,6 +407,26 @@ def checksum(self, url):
         file_checksum[_FILE_CHECKSUM_BYTES],
     )
 
+  def metadata(self, url):
+    """Fetch metadata fields of a file on the FileSystem.
+
+    Args:
+      url: string url of a file.
+
+    Returns:
+      :class:`~apache_beam.io.filesystem.FileMetadata`.
+      Note: last_updated field is not supported yet.
+
+    Raises:
+      ``BeamIOError``: if url doesn't exist.
+    """
+    _, path = self._parse_url(url)
+    status = self._hdfs_client.status(path, strict=False)
+    print(status)
+    if status is None:
+      raise BeamIOError('File not found: %s' % url)
+    return FileMetadata(url, status[_FILE_STATUS_LENGTH])

Review Comment:
   does hdfs have a way to get the timestamp?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 758051)
    Time Spent: 1h  (was: 50m)

> Add last_updated field in filesystem.FileMetaData
> -------------------------------------------------
>
>                 Key: BEAM-14314
>                 URL: https://issues.apache.org/jira/browse/BEAM-14314
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-common
>            Reporter: Yi Hu
>            Assignee: Yi Hu
>            Priority: P2
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> This will be the python counterpart of BEAM-5910
> Per python naming convention, the field will be named as 
> "last_updated_in_seconds".



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to