[
https://issues.apache.org/jira/browse/IMPALA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
radford nguyen updated IMPALA-8473:
-----------------------------------
Description:
Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
h3. Design Proposal
Move lineage logging from be to fe, where we can make use of the same plugin
approach as {{authorization_provider}} to allow a downstream user to provide
their own lineage consumers as runtime dependencies.
[[email protected]] has provided a fe patch (attached) with suggested
mechanism for allowing multiple hooks to be registered with the fe. Hooks
would be invoked from the be at appropriate places, e.g.
[https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
The hooks should all be executed asynchronously, so the current thinking is
that this execution should happen in the fe, since the be does not know about
what hooks are registered. IOW, the {{ImpalaPostExecHookFactory.executeHooks}}
method (see patch) should probably make use of a thread-pool executor service
(or something similar) in order to execute all hooks in parallel and in a
non-blocking manner, returning to the be asap.
h3. Code Review
[https://gerrit.cloudera.org/#/c/13352/]
was:
Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
h3. Design Proposal
Move lineage logging from be to fe, where we can make use of the same plugin
approach as {{authorization_provider}} to allow a downstream user to provide
their own lineage consumers as runtime dependencies.
[[email protected]] has provided a fe patch (attached) with suggested
mechanism for allowing multiple hooks to be registered with the fe. Hooks
would be invoked from the be at appropriate places, e.g.
[https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
The hooks should all be executed asynchronously, so the current thinking is
that this execution should happen in the fe, since the be does not know about
what hooks are registered. IOW, the {{ImpalaPostExecHookFactory.executeHooks}}
method (see patch) should probably make use of a thread-pool executor service
(or something similar) in order to execute all hooks in parallel and in a
non-blocking manner, returning to the be asap.
> Refactor lineage publication mechanism to allow for different consumers
> -----------------------------------------------------------------------
>
> Key: IMPALA-8473
> URL: https://issues.apache.org/jira/browse/IMPALA-8473
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Reporter: radford nguyen
> Assignee: radford nguyen
> Priority: Critical
> Attachments: ImpalaPostExecHook-infra.patch
>
>
> Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
> h3. Design Proposal
> Move lineage logging from be to fe, where we can make use of the same plugin
> approach as {{authorization_provider}} to allow a downstream user to provide
> their own lineage consumers as runtime dependencies.
> [[email protected]] has provided a fe patch (attached) with suggested
> mechanism for allowing multiple hooks to be registered with the fe. Hooks
> would be invoked from the be at appropriate places, e.g.
> [https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
> The hooks should all be executed asynchronously, so the current thinking is
> that this execution should happen in the fe, since the be does not know about
> what hooks are registered. IOW, the
> {{ImpalaPostExecHookFactory.executeHooks}} method (see patch) should probably
> make use of a thread-pool executor service (or something similar) in order to
> execute all hooks in parallel and in a non-blocking manner, returning to the
> be asap.
>
> h3. Code Review
> [https://gerrit.cloudera.org/#/c/13352/]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]