Hello!
I have a shell script that performs certain actions on files in HDFS. Script is run through Oozie workflow which I want to schedule in Falcon. Files, as usual, are located in partitions (/some_root_dir/2016/02/03 etc). Every day new directory appears and new data arrives. The problem is, sometimes data may be late for a few days and I want Falcon to recognize that and, upon late arrival, run Oozie/Shell action on that data as well - not only on today's portion. But that part is insufficiently documented at the moment: https://falcon.apache.org/FalconDocumentation.html#Handling_late_input_data https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_data_governance/content/ch_falcon_late_data_handling.html I don't understand, what should be in the late workflow? How and at which moment does Falcon decide on which directories to run that late workflow? How are the dates (locations) of those directories passed to the late workflow?? Best regards, Mike
