[GitHub] drill issue #701: DRILL-4963: Fix issues with dynamically loaded overloaded ...

paul-rogers Fri, 03 Feb 2017 16:01:23 -0800

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/701
  
    As I understand it, the issues are these:
    
    * When parsing/planning a query, function references are ambiguous.
    * If a function x() simply does not exist at all, we get a clear signal and 
can check the function registry for updates.
    * When names are overridden, there may be many x(arg1) functions: some 
defined locally, some recently added to the registry.
    * In this case, we get no clear signal that we should check the registry 
since we might find a good-enough match locally and not know to check for a 
better match in the registry.
    
    The solution is to check the registry version on each new query. This must 
be done for every query (with functions) since we can never be certain whether 
an override exists in the registry.
    
    The problem is that a check of ZK, even to get a version, adds latency. 
99.9% of the time, nothing will have changed. How can we fix that?
    
    Let's take a step back. The original problems we tried to solve were:
    
    If a user executes a CREATE FUNCTION x on Drillbit A and the command 
returns successfully, then:
    
    * If the same user immediately executes a query using x on the same 
Drillbit, that query succeeds.
    * If some other user executes a query using x (on any Drillbit, say 
Drillbit B), then the query either fails (x is not found) or succeeds (x is 
found on Drillbit B and on all Drillbits that run fragments.)
    
    In general, the above requires lots of synchronization: we'd want every 
Drillbit to synchronize with the registry before every query parse and fragment 
execution. We know that is expensive. So, we looked for special case 
opportunities to exploit. The "found/not found" semantics above appeared to 
provide that special case. What that trick off the table, we are back to the 
massive synchronization solution, which is costly.
    
    We want to keep the semantics listed above, but without the cost of 
synchronization.
    Just tossing out ideas (I'm not (yet) proposing changes), perhaps:
    
    * Each Drillbit maintains a cache of the most recent ZK registry version it 
has seen.
    * When a Foreman registers a function, it updates the ZK registry and its 
locally-cached version number.
    * When a Foreman runs a query, it includes the registry version number in 
the physical plan sent to each Drillbit for execution.
    * The Drillbit checks the plan's version number against its cached version. 
If the plan version is newer, the Drillbit downloads the latest registry data 
and updates its cached version number.
    
    The above ensures that, if a function is registered on Drillbit A, then any 
query that is submitted to A will execute anywhere on the cluster. So far so 
good.
    
    But, what about queries submitted to Drillbits B, C and D? How do they 
learn about the new functions? Here, perhaps we can use an eventually 
consistent pattern. The other Drillbits listen for ZK change notifications and 
refresh their local registries from ZK, and update the locally cached version 
number.
    
    Now, we get the semantics that if a function is defined on Drillbit A, a 
brief time later it will be available on Drillbits B, C and D. Once available, 
the above rules apply: the registry version is written into the plan and all 
other fragment executors will force update their local registries if they 
haven't yet gotten the ZK update notices.
    
    The only drawback is a slight delay in a function becoming available to the 
cluster. But, that delay is fine in an eventually consistent design. Said 
another way, we make no transaction guarantees that updating something on 
Drillbit A will be immediately reflected on B, etc.
    
    This is a rough draft. It may be what the code does. I'll continue to 
review and revise the note as I learn more.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #701: DRILL-4963: Fix issues with dynamically loaded overloaded ...

Reply via email to