Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Andrew Schwartzmeyer Tue, 04 Sep 2018 14:13:40 -0700

I think your approach would be fairly sound. That is, to change thelogic to read the IDs from the info file instead of the paths. But Ialso think we can punt this for now (as I do not think a task ID like'Hello%3AWorld' is plausibly in use right now), and implement a fix forcolons now that would remain compatible.

If we add encode/decode logic for colons on Windows, we do not introducebackward compatibility issues on other platforms (as we'd constrain thechange to Windows), and in the future, we can safely replace the decodelogic with your approach. That is to say, we implement the encoding assparingly as possible, but implement it now, because it's kind ofrequired, and we implement the decoding only as a stop-gap until wereplace this logic with reading from the info file instead. If we laterfind another character in use that also needs to be encoded, we can thenabstract the single encoding into a per-platform encoding set.


Does this seem reasonable?

Thanks,

Andy

P.S. Sorry this took a while to get back to, I was out last week.

On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:

I'm a bit concerned about the recovery logic and backwardcompatibility:
The changes we're making shouldn't affect existing users,
and we should try hard to avoid any future backward compatibilityproblem.
Say if there is already some custom framework using task ID'Hello%3AWorld',then if we blindly decode the task path during recovery, we will getthe
wrong ID 'Hello:World'.
On the other hand, if we don't decode the task path during recovery,
then later on during checkpointing for the same task,
we shouldn't blindly encode the task ID, because it might create a
different path,
and we'll need to introduce some migration code to avoid suchduplication.
Fortunately, we do checkpoint the executor IDs and task IDs in the info
files under the meta dir.
So I'm considering the following design to minimize the backward
compatibility issue we might have:
During recovery, we don't decode the recovered task path,
but get the executor/task ID from the info file instead of relying on
parsing the executor/task path.
When checkpointing, we only encode executor/task IDs if they contain
reserved characters.
The set of reserved characters could be defined as a platform-dependent
variable,
similar to what we have done for `PATH_SEPARATOR`.

The above design would look a bit more complicated then just blindly
applying percent encoding
to when constructing checkpoint paths, but it doesn't require extra
checkpoint migration logic,
and would keep the exact same behavior we have now for "normal"
executor/task IDs.

What did you guys think? Please feel free to raise any concern :)
And we don't need to implement the whole thing for now.
For example, we could start with just dealing with colons,
and extend the implementation later on,
as long as the partial solution we're going to have right now doesn't
create future tech debts!

Best,
Chun-Hung

On Thu, Aug 23, 2018 at 1:42 PM Greg Mann <[email protected]> wrote:
Thanks Andy! Responses inlined below.
No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve abunch
of problems that we haven't encountered.
It's true that I'm not aware of other scenarios where
filesystem-disallowed characters in task/executor IDs have causedissues
for users, and this issue has existed for a long time. However, when
feasible I would like to fix issues that we're aware of before theycauseproblems for users, rather than after. I would suggest that since wehaveone compelling case that we need to address now, it's worthformulating an
approach for the general case, so that we can be sure any current work
doesn't get in our way later on.
I'm somewhat comfortable doing so only for Windows, as we don'treallyneed to worry about the recovery scenario; but very uncomfortableabout
doing so for Linux etc., for precisely that reason.
So expanding this is definitely up for debate; but we must fix thebug
with `:`.
Indeed, addressing the general case may prove to be much more complex- Ican certainly identify with this situation, where a fix for a smallerissue
turns into a big project :)
It may turn out to be possible to implement a scoped-down solution forthe
colon case now, and extend it later on. I think it would be good if we
could at least get an idea of how we want to handle the general casenow,
so that any short-term solutions can be a constructive step toward the
long-term.

Cheers,
G

Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

Reply via email to