sschaetz opened a new pull request #13996:
URL: https://github.com/apache/airflow/pull/13996


   This change adds a new GCS transform operator. It is based on the existing 
transform operator with the addition that it will transform multiple files that 
match the prefix and that were updated within a time-span. The time-span is 
implicitly defined: it is the time between the current execution timestamp of 
the DAG instance (time-span start) and the next execution timestamp of the DAG 
(time-span end). 
   
   The use-case is some entity generates files at irregular intervals and an 
undefined number. The operator will pick up all files that were updated since 
it executed last. Typically the transform script will iterate over the files, 
open them, extract some information, collate them into one or more files and 
upload them to GCS. These result files can then be loaded into BigQuery or 
processed further or served via a webserver.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to