[
https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161529#comment-17161529
]
Thomas Tauber-Marshall commented on IMPALA-9180:
------------------------------------------------
Some examples of things that can be removed as part of this:
- In BackendDescriptorPB there's an 'address' and a 'krpc_address', see:
https://github.com/apache/impala/blob/master/common/protobuf/statestore_service.proto#L63
The port that's part of 'address' is no longer in use. However, we need both
the hostname (currently in 'address') and the IP address (currently in
'krpc_address') in order to do things like call RpcMgr::GetProxy. Probably the
easiest thing to do is to replace 'TNetworkAddress address' with 'string
hostname'
- In TQueryCtx, 'coord_address' is no longer in use and can be removed.
- The files be/src/service/impala-internal-service.(h/cc) are no longer in use
and can be removed.
- The flag 'be_port' is no longer used and can be made a REMOVED_FLAG and all
infrastructure around it can be cleaned up (eg. in
tests/common/impala_service.py)
When removing these things, what to replace them with will need some thought:
- In cases where we've keyed maps on a TNetworkAddress of a backend, probably
we should change it to key off of the UniqueIdPB backend id, eg. in
ExecutorBlacklist I think we can refactor 'executor_list_' to no longer be a
map<address -> list<backend>> and instead make it a map<backend_id, backend>,
which will simply the logic in ExecutorBlacklist a lot.
- In cases where we're using the address in logging or error output, replacing
it with the backend id has the advantage that its more precise (eg. if you're
examining logs to try to track down the cause of an issue that occurred around
the time an impalad crashed and a new one was restarted at the same address it
will allow you to differentiate things that happened on each impalad) but the
disadvantage of being less human friendly than hostnames. Including both might
be appropriate sometimes, but might also make things too verbose. Possibly a
good rule of thumb would be: if we're logging it, include as much info as
possible (since its okay if logs get a little verbose) but if its going in an
error message that will be returned to a client just use the hostname (since
human readability is the most important here)
This work will probably involve touching a lot of code, so it probably makes
sense to break it up into a series of patches to keep reviewing simple, eg. the
changes suggested above to ExecutorBlacklist are self-contained and could be
submitted in a patch by themselves before we even actually remove
BackendDescriptorPB::address.
> Remove legacy ImpalaInternalService
> -----------------------------------
>
> Key: IMPALA-9180
> URL: https://issues.apache.org/jira/browse/IMPALA-9180
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.4.0
> Reporter: Michael Ho
> Assignee: Wenzhe Zhou
> Priority: Minor
>
> Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service
> can now be removed. The port 22000 can also be freed up. In addition to code
> change, the doc probably needs to be updated to reflect the fact that 22000
> is no longer in use.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]