[
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=146596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-146596
]
ASF GitHub Bot logged work on BEAM-5036:
----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Sep/18 21:10
Start Date: 21/Sep/18 21:10
Worklog Time Spent: 10m
Work Description: chamikaramj commented on issue #6289: [BEAM-5036]
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-423671589
Luke, my comment was that definition of "right thing" might be file-system
dependent. For example, HDFS doesn't support overwriting existing files while
other file systems might allow this. So we either have to pass the
responsibility of handling that complexity and providing a unified interface to
each FileSystem implementation or have to do that at the common FileSystems
implementation. My argument was that it might be better to handle this
complexity at a a single common place that is developed/maintained by the Beam
team instead of passing that to FileSystem authors.
Tim, in the spirit of the above comment. I prefer handling
"OVERWRITE_EXISTING_FILES" option at FileSystems level instead of passing to
FileSystem interface. I believe most file-systems will either fail or support
overwriting but will not offer that as an option and adding this to FileSystem
interface will add an extra burden to each FileSystem author since this option
might have to be supported by utilizing other methods of the FileSystem
interface. I believe this complexity has to be pushed to FileSystems interface.
Also, agree that this thread has become too long for a PR comment thread.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 146596)
Time Spent: 3h 40m (was: 3.5h)
> Optimize FileBasedSink's WriteOperation.moveToOutput()
> ------------------------------------------------------
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
> Issue Type: Improvement
> Components: io-java-files
> Affects Versions: 2.5.0
> Reporter: Jozef Vilcek
> Assignee: Tim Robertson
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by
> copy+delete. It would be better to use a rename() which can be much more
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)