I think a 3.4.1 branch is a good idea. It is nice to have a branch that can accept changes to fix Sentry issues. Also, the Maven repo changes were a surprise for everyone, and the latest release should always be buildable.
On a side note, if we want to scrutinize where all the jars are coming from, the maven logs from https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to grep and that job prints some statistics about how many artifacts come from each repo. Here is the output from the latest run on the master branch: Number of artifacts downloaded from each repo: 16 cdh.rcs.releases.repo 2067 central 203 impala.cdp.repo 2 impala.toolchain.kudu.repo Thanks, Joe On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com> wrote: > Another benefit of depending on Apache releases is we can avoid downloading > lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, Hive > binaries from the Apache mirrors is must faster. > > I think we can create another branch for this purpose. But not sure what > the branch name should be. I think 3.4.1 should only be used for > backporting bug fixes for 3.4.0. > > > > On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com> > wrote: > > > We did manage to switch to ASF Kudu master via native-toolchain by > default, > > but that was probably the easiest switch. I don't think we've tried > pinning > > to Kudu release for our official release, but it's probably doable. I > think > > the main concern would be is if there wasn't a Kudu release available > with > > a feature we depended on. > > > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't directly > > depend on YARN and the HDFS APIs have tended to be quite stable. The main > > changes we've depended on from the HDFS codebase are client changes (like > > hdfsUnbuffer() support) - I can imagine we might have to reconcile some > of > > those to get things working correctly against ASF hadoop, but that would > be > > achievable (it would basically mean switching back to using older APIs in > > ASF mode). > > > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer <csringho...@cloudera.com > > > > wrote: > > > > > > Another motivation is that we need a branch to maintain the Sentry > > > support > > > > which is removed in the 4.0 branch > > > > > > +1, it would be great to have a support branch with Sentry > > > > > > > More ambitiously, I'd love it if releases were compatible with > official > > > > released versions of our ASF dependencies like Hadoop and Ranger > > > > > > Switching completely to ASF released dependencies looks a potentially > > very > > > hard task to me for two reasons: > > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, HBase. > > > Kudu, probably some others - even if we can't find a proper release > for a > > > single one of these, then we would be stuck and would have to wait for > > > another community. > > > As an example SENTRY-2549 > > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even merged > > > yet, > > > while it breaks nearly all of our authorization tests. > > > 2. Some of our tests depend deeply on the exact behaviour of some > > > components, e.g. we may assume a given table to have a certain amount > of > > > files or size, which can be easily broken by valid differences in Hive > or > > > parquet-mr/ORC. > > > This can lead to the dilemma of a: rewriting a lot of tests b: > > skipping > > > them, making the test coverage weaker. > > > > > > A step in this direction could be to add a flag to the build like > > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH > > > dependencies with ASF ones, and see how the build goes, e.g. is the > > > build/dataload successful, if yes, then what tests are red. I think > that > > > releasing with CDH dependencies + adding some information about the > state > > > with ASF ones could be already a big improvement for adaption, even if > > not > > > everything works. E.g. someone may simply want to try Impala in a > Hadoop > > + > > > Hive cluster and not care about authorization or HBase/Kudu. > > > > > > > > > > > > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> wrote: > > > > > > > Yeah, I suppose it depends on the version and the bug fixes. > Sometimes > > > it's > > > > also new features, which it would be good to feature gate anyway. > IIRC, > > > at > > > > some point Impala wouldn't build against any released Hive version > > > because > > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with and > > Hive > > > > 2.x didn't contain a feature Impala depended on just to compile > > > > the frontend. Or maybe it was Hadoop. > > > > > > > > If we could set an example with older releases, I think it would be > > > lovely > > > > and perhaps help adoption, too! > > > > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang < > huangquanl...@gmail.com > > > > > > > wrote: > > > > > > > > > Yeah, it'd be good to depend on Apache official versions. In my > > > > > understanding, we depend on cdh/cdp snapshot versions since we need > > > some > > > > > bug fixes that haven't been released in Apache official versions. > So > > > it's > > > > > more suitable to do this for an older release like impala-3.4. > > Because > > > > all > > > > > its dependent features/bug fixes may already exist in some Apache > > > > official > > > > > versions. > > > > > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org> > > wrote: > > > > > > > > > > > I think this is the right choice. > > > > > > > > > > > > More ambitiously, I'd love it if releases were compatible with > > > official > > > > > > released versions of our ASF dependencies like Hadoop and Ranger. > > > > Perhaps > > > > > > this would limit the Cloudera Maven dependencies for devs. > > > > > > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang < > > > > huangquanl...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Due to Cloudera's maven repo changes, the latest released > version > > > > 3.4.0 > > > > > > is > > > > > > > not compilable now (need the patch of IMPALA-9815). I'm > thinking > > > > about > > > > > > > doing a minor release for 3.4.1. > > > > > > > > > > > > > > Another motivation is that we need a branch to maintain the > > Sentry > > > > > > support > > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug we > > > recently > > > > > > found > > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle empty > > string > > > > and > > > > > > > wildcards correctly). We have a fix in downstream but can't put > > it > > > > > > upstream > > > > > > > due to missing the Sentry support. IMPALA-10130 is another > Sentry > > > > issue > > > > > > > that we may need to fix. > > > > > > > > > > > > > > We can also apply some critical fixes in this version. Here are > > > bugs > > > > > that > > > > > > > affect 3.4.0 and are fixed in 4.0: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > > > > > > > > > > > > Any objections or suggestions? > > > > > > > > > > > > > > Thanks, > > > > > > > Quanlong > > > > > > > > > > > > > > > > > > > > > > > > > > > >