[ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=146596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-146596
 ]

ASF GitHub Bot logged work on BEAM-5036:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Sep/18 21:10
            Start Date: 21/Sep/18 21:10
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-423671589
 
 
   Luke, my comment was that definition of "right thing" might be file-system 
dependent. For example, HDFS doesn't support overwriting existing files while 
other file systems might allow this. So we either have to pass the 
responsibility of handling that complexity and providing a unified interface to 
each FileSystem implementation or have to do that at the common FileSystems 
implementation. My argument was that it might be better to handle this 
complexity at a a single common place that is developed/maintained by the Beam 
team instead of passing that to FileSystem authors. 
   
   Tim, in the spirit of the above comment. I prefer handling 
"OVERWRITE_EXISTING_FILES" option at FileSystems level instead of passing to 
FileSystem interface. I believe most file-systems will either fail or support 
overwriting but will not offer that as an option and adding this to FileSystem 
interface will add an extra burden to each FileSystem author since this option 
might have to be supported by utilizing other methods of the FileSystem 
interface. I believe this complexity has to be pushed to FileSystems interface.
   
   Also, agree that this thread has become too long for a PR comment thread. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 146596)
    Time Spent: 3h 40m  (was: 3.5h)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> ------------------------------------------------------
>
>                 Key: BEAM-5036
>                 URL: https://issues.apache.org/jira/browse/BEAM-5036
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-files
>    Affects Versions: 2.5.0
>            Reporter: Jozef Vilcek
>            Assignee: Tim Robertson
>            Priority: Major
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to