[
https://issues.apache.org/jira/browse/BEAM-14314?focusedWorklogId=763089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763089
]
ASF GitHub Bot logged work on BEAM-14314:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 18:25
Start Date: 27/Apr/22 18:25
Worklog Time Spent: 10m
Work Description: Abacn commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r860121749
##########
sdks/python/apache_beam/io/hadoopfilesystem_test.py:
##########
@@ -538,6 +539,14 @@ def test_checksum(self):
self.assertEqual(
'fake_algo-5-checksum_byte_sequence', self.fs.checksum(url))
+ def test_last_updated(self):
+ url = self.fs.join(self.tmpdir, 'f1')
+ with self.fs.create(url) as f:
+ f.write(b'Hello')
+ tolerance = 60 # 1 min
+ result = self.fs.last_updated(url)
+ self.assertAlmostEqual(result, time.time(), delta=tolerance)
Review Comment:
This mirrored an once sickbayed test
[s3io_test.test_last_updated](https://github.com/apache/beam/blob/da6acf212e93aef266630f36624f5d23a1a93801/sdks/python/apache_beam/io/aws/s3io_test.py#L116)
which set tolerance to 5 min. I did not know the reasoning of this original
setting, but once the failure has resolved, the time difference should be
instant: it is just the interval between fake file creation and the assertion
statement. If it generates confusion we could set the interval to be the same
small value in both place.
##########
sdks/python/apache_beam/io/hadoopfilesystem_test.py:
##########
@@ -36,17 +37,16 @@ class FakeFile(io.BytesIO):
"""File object for FakeHdfs"""
__hash__ = None # type: ignore[assignment]
- def __init__(self, path, mode='', type='FILE'):
+ def __init__(self, path, mode='', type='FILE', time_ms=None):
io.BytesIO.__init__(self)
-
- self.stat = {
- 'path': path,
- 'mode': mode,
- 'type': type,
- }
+ if time_ms is None:
+ time_ms = int(time.time() * 1000)
+ self.time_ms = time_ms
+ self.stat = {'path': path, 'mode': mode, 'type': type}
self.saved_data = None
def __eq__(self, other):
+ """Equality of two files. Timestamp not included in comparison"""
Review Comment:
Including timestamp would break current assertEquals used
[here](https://github.com/apache/beam/blob/7e2f746da779688657984c5987a39fb38c736b92/sdks/python/apache_beam/io/hadoopfilesystem_test.py#L368).
Considering timestamp for two "identical" file object are almost certainly
different, I did not change the behavior of `__eq__` and also the test involved
here.
Issue Time Tracking
-------------------
Worklog Id: (was: 763089)
Time Spent: 3.5h (was: 3h 20m)
> Add last_updated field in filesystem.FileMetaData
> -------------------------------------------------
>
> Key: BEAM-14314
> URL: https://issues.apache.org/jira/browse/BEAM-14314
> Project: Beam
> Issue Type: New Feature
> Components: io-py-common
> Reporter: Yi Hu
> Assignee: Yi Hu
> Priority: P2
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> This will be the python counterpart of BEAM-5910
> Per python naming convention, the field will be named as
> "last_updated_in_seconds".
--
This message was sent by Atlassian Jira
(v8.20.7#820007)