Github user arina-ielchiieva commented on a diff in the pull request:
https://github.com/apache/drill/pull/574#discussion_r78030815
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFunctionRegistry.java
---
@@ -64,62 +76,134 @@
.put("CONVERT_FROM", Pair.of(2, 2))
.put("FLATTEN", Pair.of(1, 1)).build();
+ /** Registers all functions present in Drill classpath on start-up. All
functions will be marked as built-in.*/
public DrillFunctionRegistry(ScanResult classpathScan) {
+ validate(BUILT_IN, classpathScan);
+ register(BUILT_IN, classpathScan, this.getClass().getClassLoader());
+ if (logger.isTraceEnabled()) {
+ StringBuilder allFunctions = new StringBuilder();
+ for (DrillFuncHolder method:
registryHolder.getAllFunctionsWithHolders().values()) {
+ allFunctions.append(method.toString()).append("\n");
+ }
+ logger.trace("Registered functions: [\n{}]", allFunctions);
+ }
+ }
+
+ /**
+ * Validates all functions, present in jars.
+ * Will throw {@link FunctionValidationException} if:
+ * 1. Jar with the same name has been already registered.
+ * 2. Conflicting function with the similar signature is found.
+ * 3. Aggregating function is not deterministic.
+ *
+ * @return list of validated functions
+ */
+ public List<Func> validate(String jarName, ScanResult classpathScan) {
+ List<Func> functions = Lists.newArrayList();
FunctionConverter converter = new FunctionConverter();
List<AnnotatedClassDescriptor> providerClasses =
classpathScan.getAnnotatedClasses();
- // Hash map to prevent registering functions with exactly matching
signatures
- // key: Function Name + Input's Major Type
- // value: Class name where function is implemented
- //
- final Map<String, String> functionSignatureMap = new HashMap<>();
+ if (registryHolder.containsJar(jarName)) {
+ throw new FunctionValidationException(String.format("Jar %s is
already registered", jarName));
+ }
--- End diff --
As you noted, built-in functions creation is save here, since they are
registered at start up.
The race condition you are talking about is handled by remote registry
versioning (thus by Zoookeeper itself).
As you know that we have two validation steps: local and remote.
So this method is responsible for local validation.
Let's say we have:
Thread1 that registers Jar1 where F1(VARCHAR-REQUIRED) is present
Thread2 that registers Jar2 where F1(VARCHAR-REQUIRED) is present
Since F1(VARCHAR-REQUIRED) is absent in LOCAL function registry, both
threads pass local validation successfully.
Then they start remote validation.
Each thread retrieves remote function registry with version 1.
Since F1(VARCHAR-REQUIRED) is absent in REMOTE function registry, both
threads pass remote validation successfully.
Then each thread updates remote function registry and tries to send it to
Zookeeper.
This part is controlled by Zookeeper, eventually one thread will send
updated remote registry in Zookeeper first. and remote registry version will
change to 2. So the other thread will get VersionMismatchException. In this
case such thread will load remote registry with version 2 and execute remote
validation again during which it will detect duplicates and send appropriate
response to the user.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---