[
https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=247572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247572
]
ASF GitHub Bot logged work on BEAM-2857:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/May/19 17:35
Start Date: 23/May/19 17:35
Worklog Time Spent: 10m
Work Description: udim commented on issue #8394: [BEAM-2857] Implementing
WriteToFiles transform for fileio (Python)
URL: https://github.com/apache/beam/pull/8394#issuecomment-495313662
Yes, it applies to cases where the pipeline fails, but could also be in
cases where it is cancelled or updated (does the temp directory change in that
case?).
We could format temp directory names to have the job ID in them, so it's
possible to check if the job is still running and delete temp files if it
isn't. This process would not be run as part of the pipeline.
Regarding temp directory deletion, I believe that for batch jobs they should
be removed at the end (because it's better to clean up if we can). For
streaming jobs, there's no hook for cleanup during cancellation and updates
AFAIK.
Regarding failed bundles, assuming that the pipeline has not failed, it
should be possible to scan for and delete any remaining temporary files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 247572)
Time Spent: 4h 50m (was: 4h 40m)
> Create FileIO in Python
> -----------------------
>
> Key: BEAM-2857
> URL: https://issues.apache.org/jira/browse/BEAM-2857
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Eugene Kirpichov
> Assignee: Pablo Estrada
> Priority: Major
> Labels: gsoc, gsoc2019, mentor
> Time Spent: 4h 50m
> Remaining Estimate: 0h
>
> Beam Java has a FileIO with operations: match()/matchAll(), readMatches(),
> which together cover the majority of needs for general-purpose file
> ingestion. Beam Python should have something similar.
> An early design document for this: https://s.apache.org/fileio-beam-python
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)