For naming convention we could follow the example set by the 2.x branch, which was created when major incompatible changes, including major changes in Impala's Hadoop dependencies, started landing on the master branch.
We could create 3.x as a long-term branch for Sentry fixes, Hadoop dependency changes and general maintenance. At the same time 3.4.1 can also be released as a "maintenance" or bugfix release, ensuring that Impala 3.4.[x1] is buildable again. Thanks, - Laszlo On Wed, Dec 9, 2020 at 6:39 AM Joe McDonnell <joemcdonn...@cloudera.com> wrote: > I think a 3.4.1 branch is a good idea. It is nice to have a branch that can > accept changes to fix Sentry issues. Also, the Maven repo changes were a > surprise for everyone, and the latest release should always be buildable. > > On a side note, if we want to scrutinize where all the jars are coming > from, the maven logs from > https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to grep > and that job prints some statistics about how many artifacts come from each > repo. Here is the output from the latest run on the master branch: > > Number of artifacts downloaded from each repo: > 16 cdh.rcs.releases.repo > 2067 central > 203 impala.cdp.repo > 2 impala.toolchain.kudu.repo > > Thanks, > > Joe > > > On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com> > wrote: > > > Another benefit of depending on Apache releases is we can avoid > downloading > > lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, > Hive > > binaries from the Apache mirrors is must faster. > > > > I think we can create another branch for this purpose. But not sure what > > the branch name should be. I think 3.4.1 should only be used for > > backporting bug fixes for 3.4.0. > > > > > > > > On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com> > > wrote: > > > > > We did manage to switch to ASF Kudu master via native-toolchain by > > default, > > > but that was probably the easiest switch. I don't think we've tried > > pinning > > > to Kudu release for our official release, but it's probably doable. I > > think > > > the main concern would be is if there wasn't a Kudu release available > > with > > > a feature we depended on. > > > > > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't > directly > > > depend on YARN and the HDFS APIs have tended to be quite stable. The > main > > > changes we've depended on from the HDFS codebase are client changes > (like > > > hdfsUnbuffer() support) - I can imagine we might have to reconcile some > > of > > > those to get things working correctly against ASF hadoop, but that > would > > be > > > achievable (it would basically mean switching back to using older APIs > in > > > ASF mode). > > > > > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer < > csringho...@cloudera.com > > > > > > wrote: > > > > > > > > Another motivation is that we need a branch to maintain the Sentry > > > > support > > > > > which is removed in the 4.0 branch > > > > > > > > +1, it would be great to have a support branch with Sentry > > > > > > > > > More ambitiously, I'd love it if releases were compatible with > > official > > > > > released versions of our ASF dependencies like Hadoop and Ranger > > > > > > > > Switching completely to ASF released dependencies looks a potentially > > > very > > > > hard task to me for two reasons: > > > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, > HBase. > > > > Kudu, probably some others - even if we can't find a proper release > > for a > > > > single one of these, then we would be stuck and would have to wait > for > > > > another community. > > > > As an example SENTRY-2549 > > > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even > merged > > > > yet, > > > > while it breaks nearly all of our authorization tests. > > > > 2. Some of our tests depend deeply on the exact behaviour of some > > > > components, e.g. we may assume a given table to have a certain amount > > of > > > > files or size, which can be easily broken by valid differences in > Hive > > or > > > > parquet-mr/ORC. > > > > This can lead to the dilemma of a: rewriting a lot of tests b: > > > skipping > > > > them, making the test coverage weaker. > > > > > > > > A step in this direction could be to add a flag to the build like > > > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH > > > > dependencies with ASF ones, and see how the build goes, e.g. is the > > > > build/dataload successful, if yes, then what tests are red. I think > > that > > > > releasing with CDH dependencies + adding some information about the > > state > > > > with ASF ones could be already a big improvement for adaption, even > if > > > not > > > > everything works. E.g. someone may simply want to try Impala in a > > Hadoop > > > + > > > > Hive cluster and not care about authorization or HBase/Kudu. > > > > > > > > > > > > > > > > > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> wrote: > > > > > > > > > Yeah, I suppose it depends on the version and the bug fixes. > > Sometimes > > > > it's > > > > > also new features, which it would be good to feature gate anyway. > > IIRC, > > > > at > > > > > some point Impala wouldn't build against any released Hive version > > > > because > > > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with > and > > > Hive > > > > > 2.x didn't contain a feature Impala depended on just to compile > > > > > the frontend. Or maybe it was Hadoop. > > > > > > > > > > If we could set an example with older releases, I think it would be > > > > lovely > > > > > and perhaps help adoption, too! > > > > > > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang < > > huangquanl...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Yeah, it'd be good to depend on Apache official versions. In my > > > > > > understanding, we depend on cdh/cdp snapshot versions since we > need > > > > some > > > > > > bug fixes that haven't been released in Apache official versions. > > So > > > > it's > > > > > > more suitable to do this for an older release like impala-3.4. > > > Because > > > > > all > > > > > > its dependent features/bug fixes may already exist in some Apache > > > > > official > > > > > > versions. > > > > > > > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org> > > > wrote: > > > > > > > > > > > > > I think this is the right choice. > > > > > > > > > > > > > > More ambitiously, I'd love it if releases were compatible with > > > > official > > > > > > > released versions of our ASF dependencies like Hadoop and > Ranger. > > > > > Perhaps > > > > > > > this would limit the Cloudera Maven dependencies for devs. > > > > > > > > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang < > > > > > huangquanl...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > Due to Cloudera's maven repo changes, the latest released > > version > > > > > 3.4.0 > > > > > > > is > > > > > > > > not compilable now (need the patch of IMPALA-9815). I'm > > thinking > > > > > about > > > > > > > > doing a minor release for 3.4.1. > > > > > > > > > > > > > > > > Another motivation is that we need a branch to maintain the > > > Sentry > > > > > > > support > > > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug we > > > > recently > > > > > > > found > > > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle empty > > > string > > > > > and > > > > > > > > wildcards correctly). We have a fix in downstream but can't > put > > > it > > > > > > > upstream > > > > > > > > due to missing the Sentry support. IMPALA-10130 is another > > Sentry > > > > > issue > > > > > > > > that we may need to fix. > > > > > > > > > > > > > > > > We can also apply some critical fixes in this version. Here > are > > > > bugs > > > > > > that > > > > > > > > affect 3.4.0 and are fixed in 4.0: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > > > > > > > > > > > > > > Any objections or suggestions? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Quanlong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >