[ 
https://issues.apache.org/jira/browse/BEAM-14314?focusedWorklogId=763089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763089
 ]

ASF GitHub Bot logged work on BEAM-14314:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Apr/22 18:25
            Start Date: 27/Apr/22 18:25
    Worklog Time Spent: 10m 
      Work Description: Abacn commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r860121749


##########
sdks/python/apache_beam/io/hadoopfilesystem_test.py:
##########
@@ -538,6 +539,14 @@ def test_checksum(self):
     self.assertEqual(
         'fake_algo-5-checksum_byte_sequence', self.fs.checksum(url))
 
+  def test_last_updated(self):
+    url = self.fs.join(self.tmpdir, 'f1')
+    with self.fs.create(url) as f:
+      f.write(b'Hello')
+    tolerance = 60  # 1 min
+    result = self.fs.last_updated(url)
+    self.assertAlmostEqual(result, time.time(), delta=tolerance)

Review Comment:
   This mirrored an once sickbayed test 
[s3io_test.test_last_updated](https://github.com/apache/beam/blob/da6acf212e93aef266630f36624f5d23a1a93801/sdks/python/apache_beam/io/aws/s3io_test.py#L116)
 which set tolerance to 5 min. I did not know the reasoning of this original 
setting, but once the failure has resolved, the time difference should be 
instant: it is just the interval between fake file creation and the assertion 
statement. If it generates confusion we could set the interval to be the same 
small value in both place.



##########
sdks/python/apache_beam/io/hadoopfilesystem_test.py:
##########
@@ -36,17 +37,16 @@ class FakeFile(io.BytesIO):
   """File object for FakeHdfs"""
   __hash__ = None  # type: ignore[assignment]
 
-  def __init__(self, path, mode='', type='FILE'):
+  def __init__(self, path, mode='', type='FILE', time_ms=None):
     io.BytesIO.__init__(self)
-
-    self.stat = {
-        'path': path,
-        'mode': mode,
-        'type': type,
-    }
+    if time_ms is None:
+      time_ms = int(time.time() * 1000)
+    self.time_ms = time_ms
+    self.stat = {'path': path, 'mode': mode, 'type': type}
     self.saved_data = None
 
   def __eq__(self, other):
+    """Equality of two files. Timestamp not included in comparison"""

Review Comment:
   Including timestamp would break current assertEquals used 
[here](https://github.com/apache/beam/blob/7e2f746da779688657984c5987a39fb38c736b92/sdks/python/apache_beam/io/hadoopfilesystem_test.py#L368).
 Considering timestamp for two "identical" file object are almost certainly 
different, I did not change the behavior of `__eq__` and also the test involved 
here.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 763089)
    Time Spent: 3.5h  (was: 3h 20m)

> Add last_updated field in filesystem.FileMetaData
> -------------------------------------------------
>
>                 Key: BEAM-14314
>                 URL: https://issues.apache.org/jira/browse/BEAM-14314
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-common
>            Reporter: Yi Hu
>            Assignee: Yi Hu
>            Priority: P2
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This will be the python counterpart of BEAM-5910
> Per python naming convention, the field will be named as 
> "last_updated_in_seconds".



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to