Abacn commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r852409977


##########
sdks/python/apache_beam/io/hadoopfilesystem.py:
##########
@@ -399,6 +407,26 @@ def checksum(self, url):
         file_checksum[_FILE_CHECKSUM_BYTES],
     )
 
+  def metadata(self, url):
+    """Fetch metadata fields of a file on the FileSystem.
+
+    Args:
+      url: string url of a file.
+
+    Returns:
+      :class:`~apache_beam.io.filesystem.FileMetadata`.
+      Note: last_updated field is not supported yet.
+
+    Raises:
+      ``BeamIOError``: if url doesn't exist.
+    """
+    _, path = self._parse_url(url)
+    status = self._hdfs_client.status(path, strict=False)
+    print(status)
+    if status is None:
+      raise BeamIOError('File not found: %s' % url)
+    return FileMetadata(url, status[_FILE_STATUS_LENGTH])

Review Comment:
   I believe so, according to 
http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#FileStatus, the returned json 
should include a 'modificationTime' field. Implementing this also involves 
updating FakeFile class in unit test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to