Hi Igor,
Thank you for finally addressing a long-running irritation: that the Drill Map
type is not a map, it is a tuple.
Perhaps you can divide the discussion into three parts.
1. Renaming classes, enums and other items internal to the Drill source code.
2. Renaming classes that are part of a public or ad-hoc API.
3. Renaming items visible to users.
Changing items in part 1 causes a one-time disruption to anyone who has a
working branch. However, a rebase onto master would easily resolve any issues.
So, changes in this group are pretty safe.
The PR also seems to change symbols visible to the anyone who has code in a
repo separate from Drill, but that builds against Drill. All UDFs and plugins
that use the former map classes must change. This means that those
contributions can support only Drill before your PR or after; the maintainer
would need two separate branches to support both versions of Drill.
Such breaking of (implied) API compatibility is often considered a "bad thing."
We may not want to complicate the lives of those who have graciously created
Drill extensions and integrations.
Finally, if we change anything visible from SqlLine, we break running
applications, which we almost certainly do not want to do. See the changes to
Types.java as an example.
Can you make the change in a way that all your changes fall only into group 1,
provide a gradual migration for group 2, and do not change anything in group 3?
For example; the MinorType enum is a de-facto public API and must retain MAP
with its current meaning, at least for some number of releases. You could add a
STRUCT enum and mark MAP deprecated, and encourage third-party code to migrate.
But we must still support MAP for some period of time to provide time for the
migration. Then, add the new "map" as, say KVMAP, TRUEMAP, KVPAIRS, HIVEMAP,
MAP2 or whatever. (Awkward, yes, but necessary.) In the future, when the old
MAP enum value is retired, it can be repurposed as an alias for KVMAP (or
whatever), and the KVMAP enum marked as deprecated, to be removed after several
more releases.
Similarly the SQL "MAP" type keyword cannot change, nor can the name of any SQL
function (UDF) that use the "map" term. These changes will break SQL created by
users which generally does not end well. Again, you can add a new alias, and
encourage use of that alias.
One could certainly argue that making a breaking change will impact a limited
number of people, and that the benefit justifies the cost. I'll leave that
debate to others, focusing here on the mechanics.
Thanks,
- Paul
On Friday, May 31, 2019, 12:06:35 AM PDT, Igor Guzenko
<[email protected]> wrote:
Hello Drillers,
I'm working on the renaming of Map vector[1] and related stuff to make
space for new canonical Map<K,V> vector [2] [3]. I believe this
renaming causes big impact on Drill and related client's code
(ODBC/JDBC).
So I'd like to be sure that this renaming is really necessary and
everybody agrees with the changes. Please check the draft PR [4] and
reply on the email.
Alternative solution is simply leave current map vector as is and name
newly created Map<K,V> vector (+readers, writers etc.) differently.
[1] https://issues.apache.org/jira/browse/DRILL-7097
[2] https://issues.apache.org/jira/browse/DRILL-7096
[3]
https://docs.google.com/presentation/d/1FG4swOrkFIRL7qjiP7PSOPy8a1vnxs5Z9PM3ZfRPRYo/edit#slide=id.p
[4] https://github.com/apache/drill/pull/1803
Thanks, Igor Guzenko