[ https://issues.apache.org/jira/browse/IMPALA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849578#comment-16849578 ]
ASF subversion and git services commented on IMPALA-8473: --------------------------------------------------------- Commit 31195eb8119ac6a557486a10dc24692bb0202f85 in impala's branch refs/heads/master from Radford Nguyen [ https://gitbox.apache.org/repos/asf?p=impala.git;h=31195eb ] IMPALA-8473: Publish lineage info via hook This commit introduces a hook mechanism for publishing, lineage data specifically, but query information more generally, from Impala. The legacy behavior of writing the lineage file is being retained but deprecated. Hooks can be implemented by downstream consumers (i.e. runtime dependencies) to hook into supported places during Impala query execution: - impalad startup - query completion - see IMPALA-8572 for caveat/details The consumers are to be frontend Java dependencies intiated at runtime. 2 backend flags configure this behavior: - `query_event_hook_classes` specifies a comma-separated list of hook consumer implementation classes that are instantiated and registered at impala start up. - `query_event_hook_nthreads` specifies the number of threads to use for asynchronous hook execution. (Relevant if multiple hooks are registered.) Lineage information is passed from the backend after a query completes (but before it returns) and given to every hook to execute asynchronously. In other words, a query may complete and return to the user before any or all hooks have completed executing. An exception during hook on-query-complete execution will simply be logged and will not be (directly) fatal to the system. Tests: - added unit tests for FE hook execution - added E2E tests for hook configuration, execution, error - ran full build, tests Change-Id: I23a896537a98bfef07fb27c70e9a87c105cd77a1 Reviewed-on: http://gerrit.cloudera.org:8080/13352 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Refactor lineage publication mechanism to allow for different consumers > ----------------------------------------------------------------------- > > Key: IMPALA-8473 > URL: https://issues.apache.org/jira/browse/IMPALA-8473 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend > Reporter: radford nguyen > Assignee: radford nguyen > Priority: Critical > Attachments: ImpalaPostExecHook-infra.patch > > > Impetus for this change is to allow lineage to be consumed by Atlas via Kafka. > h3. Design Proposal > Implement a plugin approach (similar to {{authorization_provider}}) for > consuming query event hooks, where downstream users can provide their own > hook implementations as runtime dependencies. > Keep but deprecate existing lineage event file writing. > [~mad...@apache.org] has provided a fe patch (attached) with suggested > mechanism for allowing multiple hooks to be registered with the fe. Hooks > would be invoked from the be at appropriate places, e.g. > [https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466]. > The hooks should all be executed asynchronously, so the current thinking is > that this execution should happen in the fe, since the be does not know about > what hooks are registered. IOW, the > {{ImpalaPostExecHookFactory.executeHooks}} method (see patch) should probably > make use of a thread-pool executor service (or something similar) in order to > execute all hooks in parallel and in a non-blocking manner, returning to the > be asap. > > h3. Code Review > [https://gerrit.cloudera.org/#/c/13352/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org