I think that the dead code approach, while a bit unpalatable and worse than reverting, is probably better than leaving the parameter (even if it is hidden)
On Tue, May 12, 2020 at 12:46 PM Ryan Blue <[email protected]> wrote: > +1 for the approach Jungtaek suggests. That will avoid needing to support > behavior that is not well understood with minimal changes. > > On Tue, May 12, 2020 at 1:45 AM Jungtaek Lim <[email protected]> > wrote: > >> Before I forget, we'd better not forget to change the doc, as create >> table doc looks to represent current syntax which will be incorrect later. >> >> On Tue, May 12, 2020 at 5:32 PM Jungtaek Lim < >> [email protected]> wrote: >> >>> It's not only for end users, but also for us. Spark itself uses the >>> config "true" and "false" in tests and it still brings confusion. We still >>> have to deal with both situations. >>> >>> I'm wondering how long days it would be needed to revert it cleanly, but >>> if we worry about the amount of code change just around the new RC, what >>> about make the code dirty (should be fixed soon) but less headache via >>> applying traditional (and bad) way? >>> >>> Let's just remove the config so that the config cannot be used in any >>> way (even in Spark codebase), and set corresponding field in parser to the >>> constant value so that no one can modify in any way. This would make the >>> dead code by intention which should be cleaned it up later, so let's add >>> FIXME comment there so that anyone can take it up for cleaning up the code >>> later. (If no one volunteers then I'll probably pick up.) >>> >>> That is a bad pattern, but still better as we prevent end users (even >>> early adopters) go through the undocumented path in any way, and that will >>> be explicitly marked as "should be fixed". This is different from retaining >>> config - I don't expect unified create table syntax will be landed in >>> bugfix version, so even unified create table syntax can be landed in 3.1.0 >>> (this is also not guaranteed) the config will live in 3.0.x in any way. If >>> we temporarily go dirty way then we can clean up the code in any version, >>> even from bugfix version, maybe within a couple of weeks just after 3.0.0 >>> is released. >>> >>> Does it sound valid? >>> >>> On Tue, May 12, 2020 at 2:35 PM Wenchen Fan <[email protected]> wrote: >>> >>>> SPARK-30098 was merged about 6 months ago. It's not a clean revert and >>>> we may need to spend quite a bit of time to resolve conflicts and fix >>>> tests. >>>> >>>> I don't see why it's still a problem if a feature is disabled and >>>> hidden from end-users (it's undocumented, the config is internal). The >>>> related code will be replaced in the master branch sooner or later, when we >>>> unify the syntaxes. >>>> >>>> >>>> >>>> On Tue, May 12, 2020 at 6:16 AM Ryan Blue <[email protected]> >>>> wrote: >>>> >>>>> I'm all for getting the unified syntax into master. The only issue >>>>> appears to be whether or not to pass the presence of the EXTERNAL keyword >>>>> through to a catalog in v2. Maybe it's time to start a discuss thread for >>>>> that issue so we're not stuck for another 6 weeks on it. >>>>> >>>>> On Mon, May 11, 2020 at 3:13 PM Jungtaek Lim < >>>>> [email protected]> wrote: >>>>> >>>>>> Btw another wondering here is, is it good to retain the flag on >>>>>> master as an intermediate step? Wouldn't it be better for us to >>>>>> start "unified create table syntax" from scratch? >>>>>> >>>>>> >>>>>> On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I'm sorry, but I have to agree with Ryan and Russell. I chose the >>>>>>> option 1 because it's less worse than option 2, but it doesn't mean I >>>>>>> fully >>>>>>> agree with option 1. >>>>>>> >>>>>>> Let's make below things clear if we really go with option 1, >>>>>>> otherwise please consider reverting it. >>>>>>> >>>>>>> * Do you fully indicate about "all" the paths where the second >>>>>>> create table syntax is taken? >>>>>>> * Could you explain "why" to end users without any confusion? Do you >>>>>>> think end users will understand it easily? >>>>>>> * Do you have an actual end users to guide to turn this on? Or do >>>>>>> you have a plan to turn this on for your team/customers and deal with >>>>>>> the ambiguity? >>>>>>> * Could you please document about how things will change if the flag >>>>>>> is turned on? >>>>>>> >>>>>>> I guess the option 1 is to leave a flag as "undocumented" one and >>>>>>> forget about the path to turn on, but I think that would lead to make >>>>>>> the >>>>>>> feature be "broken window" even we are not able to touch. >>>>>>> >>>>>>> On Tue, May 12, 2020 at 6:45 AM Russell Spitzer < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I think reverting 30098 is the right decision here if we want to >>>>>>>> unblock 3.0. We shouldn't ship with features which we know do not >>>>>>>> function >>>>>>>> in the way we intend, regardless of how little exposure most users >>>>>>>> have to >>>>>>>> them. Even if it's off my default, we should probably work to avoid >>>>>>>> switches that cause things to behave unpredictably or require a flow >>>>>>>> chart >>>>>>>> to actually determine what will happen. >>>>>>>> >>>>>>>> On Mon, May 11, 2020 at 3:07 PM Ryan Blue <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm all for fixing behavior in master by turning this off as an >>>>>>>>> intermediate step, but I don't think that Spark 3.0 can safely include >>>>>>>>> SPARK-30098. >>>>>>>>> >>>>>>>>> The problem is that SPARK-30098 introduces strange behavior, as >>>>>>>>> Jungtaek pointed out. And that behavior is not fully understood. While >>>>>>>>> working on a unified CREATE TABLE syntax, I hit additional test >>>>>>>>> failures >>>>>>>>> <https://github.com/apache/spark/pull/28026#issuecomment-606967363> >>>>>>>>> where the wrong create path was being used. >>>>>>>>> >>>>>>>>> Unless we plan to NOT support the behavior >>>>>>>>> when spark.sql.legacy.createHiveTableByDefault.enabled is disabled, we >>>>>>>>> should not ship Spark 3.0 with SPARK-30098. Otherwise, we will have >>>>>>>>> to deal >>>>>>>>> with this problem for years to come. >>>>>>>>> >>>>>>>>> On Mon, May 11, 2020 at 1:06 AM JackyLee <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> +1. Agree with Xiao Li and Jungtaek Lim. >>>>>>>>>> >>>>>>>>>> This seems to be controversial, and can not be done in a short >>>>>>>>>> time. It is >>>>>>>>>> necessary to choose option 1 to unblock Spark 3.0 and support it >>>>>>>>>> in 3.1. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Sent from: >>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe e-mail: [email protected] >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Software Engineer >>>>>>>>> Netflix >>>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Software Engineer >>>>> Netflix >>>>> >>>> > > -- > Ryan Blue > Software Engineer > Netflix >
