[
https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339624#comment-14339624
]
Andrew Or commented on SPARK-6048:
----------------------------------
bq. What do you mean by "duplicates the translation" (regarding 2)? It's just a
call to "translateKey()".
It's not about the number of lines that are being duplicated. It's about the
translation logic. Right now it's not correct to translate in set but not in
all interfaces exposed by SparkConf. As we have seen with the case of `remove`
it's easy to miss one or two of these interfaces. If we only translate in `get`
then we don't have to worry about this.
bq. Regarding 1, that problem exists regardless of my change.
That's actually not true. Before your change, if we specify both the deprecated
config and the most recent one, the behavior will be determined by the place
where these values are used. Even if we called `set` on the deprecated config
over the more recent one, the value of the latter is still preserved because we
didn't translate on `set`. To answer your question, the expected behavior is
for the value of the more recent config to *always* take precedence.
bq. Note the goal of the deprecated configs was to make the Spark code only
have to care about the most recent key name. Your proposal goes against that,
and would require the deprecated names to live both in SparkConf and in the
code that needs to read them.
Yes, unfortunately, and I agree it's something we need to fix in the future. My
eventual goal is to do hide all the deprecation logic throughout the Spark
code, and this is why I filed SPARK-5933 before. Currently, however, this is a
correctness issue that is blocking the 1.3 release, so my personal opinion is
that we should first fix this broken behavior and worry about the code style
later.
> SparkConf.translateConfKey should translate on get, not set
> -----------------------------------------------------------
>
> Key: SPARK-6048
> URL: https://issues.apache.org/jira/browse/SPARK-6048
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.0
> Reporter: Andrew Or
> Assignee: Andrew Or
> Priority: Blocker
>
> There are several issues with translating on set.
> (1) The most serious one is that if the user has both the deprecated and the
> latest version of the same config set, then the value picked up by SparkConf
> will be arbitrary. Why? Because during initialization of the conf we call
> `conf.set` on each property in `sys.props` in an order arbitrarily defined by
> Java. As a result, the value of the more recent config may be overridden by
> that of the deprecated one. Instead, we should always use the value of the
> most recent config.
> (2) If we translate on set, then we must keep translating everywhere else. In
> fact, the current code does not translate on remove, which means the
> following won't work if X is deprecated:
> {code}
> conf.set(X, Y)
> conf.remove(X) // X is not in the conf
> {code}
> This requires us to also translate in remove and other places, as we already
> do for contains, leading to more duplicate code.
> (3) Since we call `conf.set` on all configs when initializing the conf, we
> print all deprecation warnings in the beginning. Elsewhere in Spark, however,
> we warn the user when the deprecated config / option / env var is actually
> being used.
> We should keep this consistent so the user won't expect to find all
> deprecation messages in the beginning of his logs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]