Another benefit of depending on Apache releases is we can avoid downloading lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, Hive binaries from the Apache mirrors is must faster.
I think we can create another branch for this purpose. But not sure what the branch name should be. I think 3.4.1 should only be used for backporting bug fixes for 3.4.0. On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com> wrote: > We did manage to switch to ASF Kudu master via native-toolchain by default, > but that was probably the easiest switch. I don't think we've tried pinning > to Kudu release for our official release, but it's probably doable. I think > the main concern would be is if there wasn't a Kudu release available with > a feature we depended on. > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't directly > depend on YARN and the HDFS APIs have tended to be quite stable. The main > changes we've depended on from the HDFS codebase are client changes (like > hdfsUnbuffer() support) - I can imagine we might have to reconcile some of > those to get things working correctly against ASF hadoop, but that would be > achievable (it would basically mean switching back to using older APIs in > ASF mode). > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer <csringho...@cloudera.com> > wrote: > > > > Another motivation is that we need a branch to maintain the Sentry > > support > > > which is removed in the 4.0 branch > > > > +1, it would be great to have a support branch with Sentry > > > > > More ambitiously, I'd love it if releases were compatible with official > > > released versions of our ASF dependencies like Hadoop and Ranger > > > > Switching completely to ASF released dependencies looks a potentially > very > > hard task to me for two reasons: > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, HBase. > > Kudu, probably some others - even if we can't find a proper release for a > > single one of these, then we would be stuck and would have to wait for > > another community. > > As an example SENTRY-2549 > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even merged > > yet, > > while it breaks nearly all of our authorization tests. > > 2. Some of our tests depend deeply on the exact behaviour of some > > components, e.g. we may assume a given table to have a certain amount of > > files or size, which can be easily broken by valid differences in Hive or > > parquet-mr/ORC. > > This can lead to the dilemma of a: rewriting a lot of tests b: > skipping > > them, making the test coverage weaker. > > > > A step in this direction could be to add a flag to the build like > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH > > dependencies with ASF ones, and see how the build goes, e.g. is the > > build/dataload successful, if yes, then what tests are red. I think that > > releasing with CDH dependencies + adding some information about the state > > with ASF ones could be already a big improvement for adaption, even if > not > > everything works. E.g. someone may simply want to try Impala in a Hadoop > + > > Hive cluster and not care about authorization or HBase/Kudu. > > > > > > > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> wrote: > > > > > Yeah, I suppose it depends on the version and the bug fixes. Sometimes > > it's > > > also new features, which it would be good to feature gate anyway. IIRC, > > at > > > some point Impala wouldn't build against any released Hive version > > because > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with and > Hive > > > 2.x didn't contain a feature Impala depended on just to compile > > > the frontend. Or maybe it was Hadoop. > > > > > > If we could set an example with older releases, I think it would be > > lovely > > > and perhaps help adoption, too! > > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang <huangquanl...@gmail.com > > > > > wrote: > > > > > > > Yeah, it'd be good to depend on Apache official versions. In my > > > > understanding, we depend on cdh/cdp snapshot versions since we need > > some > > > > bug fixes that haven't been released in Apache official versions. So > > it's > > > > more suitable to do this for an older release like impala-3.4. > Because > > > all > > > > its dependent features/bug fixes may already exist in some Apache > > > official > > > > versions. > > > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org> > wrote: > > > > > > > > > I think this is the right choice. > > > > > > > > > > More ambitiously, I'd love it if releases were compatible with > > official > > > > > released versions of our ASF dependencies like Hadoop and Ranger. > > > Perhaps > > > > > this would limit the Cloudera Maven dependencies for devs. > > > > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang < > > > huangquanl...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Due to Cloudera's maven repo changes, the latest released version > > > 3.4.0 > > > > > is > > > > > > not compilable now (need the patch of IMPALA-9815). I'm thinking > > > about > > > > > > doing a minor release for 3.4.1. > > > > > > > > > > > > Another motivation is that we need a branch to maintain the > Sentry > > > > > support > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug we > > recently > > > > > found > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle empty > string > > > and > > > > > > wildcards correctly). We have a fix in downstream but can't put > it > > > > > upstream > > > > > > due to missing the Sentry support. IMPALA-10130 is another Sentry > > > issue > > > > > > that we may need to fix. > > > > > > > > > > > > We can also apply some critical fixes in this version. Here are > > bugs > > > > that > > > > > > affect 3.4.0 and are fixed in 4.0: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > > > > > > > > > > Any objections or suggestions? > > > > > > > > > > > > Thanks, > > > > > > Quanlong > > > > > > > > > > > > > > > > > > > > >