[ 
https://issues.apache.org/jira/browse/IMPALA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849578#comment-16849578
 ] 

ASF subversion and git services commented on IMPALA-8473:
---------------------------------------------------------

Commit 31195eb8119ac6a557486a10dc24692bb0202f85 in impala's branch 
refs/heads/master from Radford Nguyen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=31195eb ]

IMPALA-8473: Publish lineage info via hook

This commit introduces a hook mechanism for publishing,
lineage data specifically, but query information more
generally, from Impala.

The legacy behavior of writing the lineage file is
being retained but deprecated.

Hooks can be implemented by downstream consumers (i.e.
runtime dependencies) to hook into supported places during
Impala query execution:

- impalad startup
- query completion
    - see IMPALA-8572 for caveat/details

The consumers are to be frontend Java dependencies
intiated at runtime. 2 backend flags configure this
behavior:

- `query_event_hook_classes` specifies a comma-separated
list of hook consumer implementation classes that
are instantiated and registered at impala start up.

- `query_event_hook_nthreads`
specifies the number of threads to use for asynchronous
hook execution.  (Relevant if multiple hooks are
registered.)

Lineage information is passed from the backend after
a query completes (but before it returns) and given
to every hook to execute asynchronously.  In other words,
a query may complete and return to the user before any
or all hooks have completed executing.  An exception
during hook on-query-complete execution will simply be logged
and will not be (directly) fatal to the system.

Tests:
- added unit tests for FE hook execution
- added E2E tests for hook configuration, execution, error
- ran full build, tests

Change-Id: I23a896537a98bfef07fb27c70e9a87c105cd77a1
Reviewed-on: http://gerrit.cloudera.org:8080/13352
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Refactor lineage publication mechanism to allow for different consumers
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-8473
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8473
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: radford nguyen
>            Assignee: radford nguyen
>            Priority: Critical
>         Attachments: ImpalaPostExecHook-infra.patch
>
>
> Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
> h3. Design Proposal
> Implement a plugin approach (similar to {{authorization_provider}}) for 
> consuming query event hooks, where downstream users can provide their own 
> hook implementations as runtime dependencies.
> Keep but deprecate existing lineage event file writing.
> [~mad...@apache.org] has provided a fe patch (attached) with suggested 
> mechanism for allowing multiple hooks to be registered with the fe.  Hooks 
> would be invoked from the be at appropriate places, e.g. 
> [https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
>   The hooks should all be executed asynchronously, so the current thinking is 
> that this execution should happen in the fe, since the be does not know about 
> what hooks are registered.  IOW, the 
> {{ImpalaPostExecHookFactory.executeHooks}} method (see patch) should probably 
> make use of a thread-pool executor service (or something similar) in order to 
> execute all hooks in parallel and in a non-blocking manner, returning to the 
> be asap.
>  
> h3. Code Review
> [https://gerrit.cloudera.org/#/c/13352/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to