Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/701
  
    @jinfengni , 
    
    As it turns out, we do have a comprehensive design for the original feature 
and the MVCC revision. The key goals are that a function, once registered, is 
guaranteed to be available on all Drillbits once it is visible to any 
particular Foreman. Without this guarantee of consistency, DUDFs become 
non-determinstic and will cause customer problems.
    
    We do have a "refresh" operation: registering a DUDF updates ZK which sends 
updates to each node. The problem is the race condition. I register a UDF foo() 
on node A. I run a query from that same node. If my query happens to hit node B 
before the ZK notification, the query will fail. Our goal is that such failure 
cannot happen, hence the need for a "pull" model to augment the ZK-based "push" 
model.
    
    A manual "update" would have the same issue unless we synchronized the 
update across all nodes. Also, the only way to ensure that DUDFs are available 
is to issue an update after adding each DUDF. But, if we did that, we might as 
well make the DUDF registration itself synchronous across all nodes.
    
    And, of course, the node synchronization does not handle the race condition 
in which a new node comes up right after a synchronization starts. We'd have to 
ensure that the new node reads the proper state from ZK. We can do that if we 
first update ZK, then do synchronization to all nodes, then update ZK with the 
fact that all nodes are aware of the DUDF. 
    
    Without the "two-phase" process, our new node can come up, learn of the new 
DUDF and issue a query using the DUDF without some nodes having been notified 
of the synchronization.
    
    Overall, this is a difficult area. Relying on the well-known semantics of 
MVCC makes the problems much easier to solve.
    
    So, the question here is whether it is worth checking in this partial 
solution for 1.10, or just leave the problem open until a complete solution is 
available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to