Hi Spark Experts:
I am trying to use a stateful udf with spark structured streaming that needs to 
update the state periodically.
Here is the scenario:
1. I have a udf with a variable with default value (eg: 1)  This value is 
applied to a column (eg: subtract the variable from the column value )2. The 
variable is to be updated periodically asynchronously (eg: reading a file every 
5 minutes) and the new rows will have the new value applied to the column value.
Spark natively supports broadcast variables, but I could not find a way to 
update the broadcasted variables dynamically or rebroadcast them once so that 
the udf internal state can be updated while the structure streaming application 
is running.
I can try to read the variable from the file on each invocation of the udf but 
it will not scale since each invocation open/read/close the file.
Please let me know if there is any documentation/example to support this 
scenario.
Thanks



Reply via email to