What do you think about dateless timestamps? AFAIK that is not supported
ATM, shouldn't we drop it?

Gabor

On Wed, Mar 18, 2020 at 1:46 AM Shant Hovsepian <sh...@cloudera.com> wrote:

> +1 on RUNTIME_FILTER_WAIT_TIME_MS increasing.
>
> On Tue, Mar 17, 2020 at 5:43 PM Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
> >
> > I think we should consider changing a couple more defaults, after having
> an
> > offline conversion with Shant.
> >
> > We could change COMPRESSION_CODEC to LZ4 or ZSTD as the default. I think
> > LZ4 is the safest option perf-wise, because it will be faster across the
> > board and the decompression is now one of the main CPU bottlenecks for
> > Parquet scanning. We might need to double-check that enough of the
> > ecosystem supports LZ4, but this seems like it would be a good
> improvement.
> >
> > It *might* we worth enabled compute stats table sampling by default, but
> I
> > think that could be open for discussion.
> >
> > We could also consider bumping RUNTIME_FILTER_WAIT_TIME_MS to a higher
> > value, since I think generally higher values have proven to be more
> robust
> > for complex queries (TPC-DS, etc).
> >
> > On Tue, Mar 17, 2020 at 11:56 AM Tim Armstrong <tarmstr...@cloudera.com>
> > wrote:
> >
> > > >   - Do we still need the DECIMAL_V2 query option? Seems like this has
> > > been  true for a while. Maybe we can add it to the list of deprecated
> flags?
> > > Maybe we could officially deprecate it and phase it out soonish? It
> really
> > > only exists as a workaround for people upgrading from the old
> behaviour in
> > > 2.x. It hasn't been terribly bad maintaining the two code paths, but it
> > > would be nice to simplify it.
> > >
> > > >   - Deprecate support for ADLS, since it has effectively been
> replaced
> > > by ABFS
> > > Makes sense. It probably isn't too much overhead to keep the old code
> > > around for a while, is it? Just in case users have a bunch of data
> still
> > > sitting in the old ADLS.
> > >
> > > >   - Deprecate (or even remove) support for HDFS cacheing? Not sure
> how
> > > extensively this is used, removing the code would be nice as it
> simplifies
> > > part of the HDFS read path
> > > Anecdotally I do see it used, but a lot of times it's to affect
> scheduling
> > > rather than because saving memcpy() makes a real difference (with
> > > compressed parquet, that's rarely the bottleneck) . A compromise or
> > > in-between step would be to remove the special-casing of the zero-copy
> code
> > > path in the backend, but keep the scheduling behaviour.
> > >
> > > On Tue, Mar 17, 2020 at 11:50 AM Tim Armstrong <
> tarmstr...@cloudera.com>
> > > wrote:
> > >
> > >> I think I generally support this. A few specific comments.
> > >>
> > >> > Proposal 3: Impala-lzo
> > >> > Drop support for Impala-lzo/hadoop-lzo
> > >>
> > >> Does this mean dropping the plugin text scanner interface entirely?
> LZO
> > >> is the only implementation of that that I'm aware of (and we rely on
> it to
> > >> test the interface) so seems reasonable to me to remove something
> that has
> > >> minimal adoption and not cleanly separated from the scanner
> implementation
> > >> of core Impala.
> > >>
> > >> > Proposal 5: Sentry
> > >> > Drop support for Sentry in favor of Ranger.
> > >>
> > >> I think moving this direction makes a lot of sense given that
> activity in
> > >> the Sentry project has declined a lot (just look at the activity
> level on
> > >> the two projects, it's dramatically different), unless someone in the
> > >> community wants to step up and maintain the integration.
> > >>
> > >> > Proposal 6: Metadata
> > >> > Metadata V2 will become the default. Metadata V1 will be deprecated.
> > >> Maybe we should set a goal of removing the support in Impala 4.1 or
> 4.2?
> > >> That would allow us to remove a lot of complex code
> > >>
> > >> On Mon, Mar 16, 2020 at 10:07 AM Joe McDonnell <
> joemcdonn...@cloudera.com>
> > >> wrote:
> > >>
> > >>> Now that Impala 3.4 is branched and master is Impala 4.0, we need to
> > >>> decide
> > >>> what breaking changes will happen in Impala 4.0. I have provided a
> series
> > >>> of proposals below. I welcome feedback on them. Other proposals are
> also
> > >>> welcome.
> > >>>
> > >>> Thanks,
> > >>> Joe
> > >>>
> > >>> Proposal 0: Hadoop component versions
> > >>>
> > >>> Switch to CDP versions of components by default. This means that
> Impala
> > >>> will use Hive 3+ (which is already essentially Hive 4 and may change
> > >>> names
> > >>> to being Hive 4).
> > >>> Remove support for CDH versions of components.
> > >>> This was already discussed in the original thread for Impala 4, so
> this
> > >>> is
> > >>> not new.
> > >>>
> > >>> Proposal 1: OS support
> > >>>
> > >>> Drop support for Centos 6, Ubuntu 14, and Debian (all versions)
> > >>> Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
> > >>> Centos 7 development will be focused on newer Centos 7 versions such
> as
> > >>> 7.6
> > >>> and 7.7.
> > >>> Add support for Centos 8
> > >>> Move main development from Ubuntu 16 to Ubuntu 18 over time.
> > >>>
> > >>> Proposal 2: Python support
> > >>>
> > >>> Drop support for Python 2.6
> > >>> Add support for Python 3 over time.
> > >>>
> > >>> Proposal 3: Impala-lzo
> > >>>
> > >>> Drop support for Impala-lzo/hadoop-lzo
> > >>>
> > >>> Proposal 4: Clients
> > >>>
> > >>> Deprecate beeswax protocol. This means that it can be removed in the
> next
> > >>> major version number, but it would not be removed in Impala 4.
> Current
> > >>> users of beeswax would need to start migrating to HS2.
> > >>>
> > >>> Proposal 5: Sentry
> > >>>
> > >>> Drop support for Sentry in favor of Ranger.
> > >>>
> > >>> Proposal 6: Metadata
> > >>>
> > >>> Metadata V2 will become the default. Metadata V1 will be deprecated.
> > >>>
> > >>> Thanks,
> > >>> Joe
> > >>>
> > >>
>

Reply via email to