I think a 3.4.1 branch is a good idea. It is nice to have a branch that can
accept changes to fix Sentry issues. Also, the Maven repo changes were a
surprise for everyone, and the latest release should always be buildable.

On a side note, if we want to scrutinize where all the jars are coming
from, the maven logs from
https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to grep
and that job prints some statistics about how many artifacts come from each
repo. Here is the output from the latest run on the master branch:

Number of artifacts downloaded from each repo:
     16 cdh.rcs.releases.repo
   2067 central
    203 impala.cdp.repo
      2 impala.toolchain.kudu.repo

Thanks,

Joe


On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com>
wrote:

> Another benefit of depending on Apache releases is we can avoid downloading
> lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, Hive
> binaries from the Apache mirrors is must faster.
>
> I think we can create another branch for this purpose. But not sure what
> the branch name should be. I think 3.4.1 should only be used for
> backporting bug fixes for 3.4.0.
>
>
>
> On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > We did manage to switch to ASF Kudu master via native-toolchain by
> default,
> > but that was probably the easiest switch. I don't think we've tried
> pinning
> > to Kudu release for our official release, but it's probably doable. I
> think
> > the main concern would be is if there wasn't a Kudu release available
> with
> > a feature we depended on.
> >
> > Maybe ASF Hadoop would be the next easiest, since Impala doesn't directly
> > depend on YARN and the HDFS APIs have tended to be quite stable. The main
> > changes we've depended on from the HDFS codebase are client changes (like
> > hdfsUnbuffer() support) - I can imagine we might have to reconcile some
> of
> > those to get things working correctly against ASF hadoop, but that would
> be
> > achievable (it would basically mean switching back to using older APIs in
> > ASF mode).
> >
> > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer <csringho...@cloudera.com
> >
> > wrote:
> >
> > > > Another motivation is that we need a branch to maintain the Sentry
> > > support
> > > > which is removed in the 4.0 branch
> > >
> > > +1, it would be great to have a support branch with Sentry
> > >
> > > > More ambitiously, I'd love it if releases were compatible with
> official
> > > > released versions of our ASF dependencies like Hadoop and Ranger
> > >
> > > Switching completely to ASF released dependencies looks a potentially
> > very
> > > hard task to me for two reasons:
> > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, HBase.
> > > Kudu, probably some others - even if we can't find a proper release
> for a
> > > single one of these, then we would be stuck and would have to wait for
> > > another community.
> > >     As an example SENTRY-2549
> > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even merged
> > > yet,
> > > while it breaks nearly all of our authorization tests.
> > > 2. Some of our tests depend deeply on the exact behaviour of some
> > > components, e.g. we may assume a given table to have a certain amount
> of
> > > files or size, which can be easily broken by valid differences in Hive
> or
> > > parquet-mr/ORC.
> > >     This can lead to the dilemma of a: rewriting a lot of tests b:
> > skipping
> > > them, making the test coverage weaker.
> > >
> > > A step in this direction could be to add a flag to the build like
> > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH
> > > dependencies with ASF ones, and see how the build goes, e.g. is the
> > > build/dataload successful, if yes, then what tests are red. I think
> that
> > > releasing with CDH dependencies + adding some information about the
> state
> > > with ASF ones could be already a big improvement for adaption, even if
> > not
> > > everything works. E.g. someone may simply want to try Impala in a
> Hadoop
> > +
> > > Hive cluster and not care about authorization or HBase/Kudu.
> > >
> > >
> > >
> > >
> > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> wrote:
> > >
> > > > Yeah, I suppose it depends on the version and the bug fixes.
> Sometimes
> > > it's
> > > > also new features, which it would be good to feature gate anyway.
> IIRC,
> > > at
> > > > some point Impala wouldn't build against any released Hive version
> > > because
> > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with and
> > Hive
> > > > 2.x didn't contain a feature Impala depended on just to compile
> > > > the frontend. Or maybe it was Hadoop.
> > > >
> > > > If we could set an example with older releases, I think it would be
> > > lovely
> > > > and perhaps help adoption, too!
> > > >
> > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang <
> huangquanl...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Yeah, it'd be good to depend on Apache official versions. In my
> > > > > understanding, we depend on cdh/cdp snapshot versions since we need
> > > some
> > > > > bug fixes that haven't been released in Apache official versions.
> So
> > > it's
> > > > > more suitable to do this for an older release like impala-3.4.
> > Because
> > > > all
> > > > > its dependent features/bug fixes may already exist in some Apache
> > > > official
> > > > > versions.
> > > > >
> > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org>
> > wrote:
> > > > >
> > > > > > I think this is the right choice.
> > > > > >
> > > > > > More ambitiously, I'd love it if releases were compatible with
> > > official
> > > > > > released versions of our ASF dependencies like Hadoop and Ranger.
> > > > Perhaps
> > > > > > this would limit the Cloudera Maven dependencies for devs.
> > > > > >
> > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang <
> > > > huangquanl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Due to Cloudera's maven repo changes, the latest released
> version
> > > > 3.4.0
> > > > > > is
> > > > > > > not compilable now (need the patch of IMPALA-9815). I'm
> thinking
> > > > about
> > > > > > > doing a minor release for 3.4.1.
> > > > > > >
> > > > > > > Another motivation is that we need a branch to maintain the
> > Sentry
> > > > > > support
> > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug we
> > > recently
> > > > > > found
> > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle empty
> > string
> > > > and
> > > > > > > wildcards correctly). We have a fix in downstream but can't put
> > it
> > > > > > upstream
> > > > > > > due to missing the Sentry support. IMPALA-10130 is another
> Sentry
> > > > issue
> > > > > > > that we may need to fix.
> > > > > > >
> > > > > > > We can also apply some critical fixes in this version. Here are
> > > bugs
> > > > > that
> > > > > > > affect 3.4.0 and are fixed in 4.0:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> > > > > > >
> > > > > > > Any objections or suggestions?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Quanlong
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to