> What do you think about dateless timestamps? AFAIK that is not supported +1, I think that dateless timestamps are just confusing both in the code and for the users I created a Jira to drop it: IMPALA-9531 A number of issues with them are listed in this jira: IMPALA-5942
On Wed, Mar 18, 2020 at 3:16 PM Gabor Kaszab <gaborkas...@apache.org> wrote: > What do you think about dateless timestamps? AFAIK that is not supported > ATM, shouldn't we drop it? > > Gabor > > On Wed, Mar 18, 2020 at 1:46 AM Shant Hovsepian <sh...@cloudera.com> > wrote: > > > +1 on RUNTIME_FILTER_WAIT_TIME_MS increasing. > > > > On Tue, Mar 17, 2020 at 5:43 PM Tim Armstrong <tarmstr...@cloudera.com> > > wrote: > > > > > > I think we should consider changing a couple more defaults, after > having > > an > > > offline conversion with Shant. > > > > > > We could change COMPRESSION_CODEC to LZ4 or ZSTD as the default. I > think > > > LZ4 is the safest option perf-wise, because it will be faster across > the > > > board and the decompression is now one of the main CPU bottlenecks for > > > Parquet scanning. We might need to double-check that enough of the > > > ecosystem supports LZ4, but this seems like it would be a good > > improvement. > > > > > > It *might* we worth enabled compute stats table sampling by default, > but > > I > > > think that could be open for discussion. > > > > > > We could also consider bumping RUNTIME_FILTER_WAIT_TIME_MS to a higher > > > value, since I think generally higher values have proven to be more > > robust > > > for complex queries (TPC-DS, etc). > > > > > > On Tue, Mar 17, 2020 at 11:56 AM Tim Armstrong < > tarmstr...@cloudera.com> > > > wrote: > > > > > > > > - Do we still need the DECIMAL_V2 query option? Seems like this > has > > > > been true for a while. Maybe we can add it to the list of deprecated > > flags? > > > > Maybe we could officially deprecate it and phase it out soonish? It > > really > > > > only exists as a workaround for people upgrading from the old > > behaviour in > > > > 2.x. It hasn't been terribly bad maintaining the two code paths, but > it > > > > would be nice to simplify it. > > > > > > > > > - Deprecate support for ADLS, since it has effectively been > > replaced > > > > by ABFS > > > > Makes sense. It probably isn't too much overhead to keep the old code > > > > around for a while, is it? Just in case users have a bunch of data > > still > > > > sitting in the old ADLS. > > > > > > > > > - Deprecate (or even remove) support for HDFS cacheing? Not sure > > how > > > > extensively this is used, removing the code would be nice as it > > simplifies > > > > part of the HDFS read path > > > > Anecdotally I do see it used, but a lot of times it's to affect > > scheduling > > > > rather than because saving memcpy() makes a real difference (with > > > > compressed parquet, that's rarely the bottleneck) . A compromise or > > > > in-between step would be to remove the special-casing of the > zero-copy > > code > > > > path in the backend, but keep the scheduling behaviour. > > > > > > > > On Tue, Mar 17, 2020 at 11:50 AM Tim Armstrong < > > tarmstr...@cloudera.com> > > > > wrote: > > > > > > > >> I think I generally support this. A few specific comments. > > > >> > > > >> > Proposal 3: Impala-lzo > > > >> > Drop support for Impala-lzo/hadoop-lzo > > > >> > > > >> Does this mean dropping the plugin text scanner interface entirely? > > LZO > > > >> is the only implementation of that that I'm aware of (and we rely on > > it to > > > >> test the interface) so seems reasonable to me to remove something > > that has > > > >> minimal adoption and not cleanly separated from the scanner > > implementation > > > >> of core Impala. > > > >> > > > >> > Proposal 5: Sentry > > > >> > Drop support for Sentry in favor of Ranger. > > > >> > > > >> I think moving this direction makes a lot of sense given that > > activity in > > > >> the Sentry project has declined a lot (just look at the activity > > level on > > > >> the two projects, it's dramatically different), unless someone in > the > > > >> community wants to step up and maintain the integration. > > > >> > > > >> > Proposal 6: Metadata > > > >> > Metadata V2 will become the default. Metadata V1 will be > deprecated. > > > >> Maybe we should set a goal of removing the support in Impala 4.1 or > > 4.2? > > > >> That would allow us to remove a lot of complex code > > > >> > > > >> On Mon, Mar 16, 2020 at 10:07 AM Joe McDonnell < > > joemcdonn...@cloudera.com> > > > >> wrote: > > > >> > > > >>> Now that Impala 3.4 is branched and master is Impala 4.0, we need > to > > > >>> decide > > > >>> what breaking changes will happen in Impala 4.0. I have provided a > > series > > > >>> of proposals below. I welcome feedback on them. Other proposals are > > also > > > >>> welcome. > > > >>> > > > >>> Thanks, > > > >>> Joe > > > >>> > > > >>> Proposal 0: Hadoop component versions > > > >>> > > > >>> Switch to CDP versions of components by default. This means that > > Impala > > > >>> will use Hive 3+ (which is already essentially Hive 4 and may > change > > > >>> names > > > >>> to being Hive 4). > > > >>> Remove support for CDH versions of components. > > > >>> This was already discussed in the original thread for Impala 4, so > > this > > > >>> is > > > >>> not new. > > > >>> > > > >>> Proposal 1: OS support > > > >>> > > > >>> Drop support for Centos 6, Ubuntu 14, and Debian (all versions) > > > >>> Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12 > > > >>> Centos 7 development will be focused on newer Centos 7 versions > such > > as > > > >>> 7.6 > > > >>> and 7.7. > > > >>> Add support for Centos 8 > > > >>> Move main development from Ubuntu 16 to Ubuntu 18 over time. > > > >>> > > > >>> Proposal 2: Python support > > > >>> > > > >>> Drop support for Python 2.6 > > > >>> Add support for Python 3 over time. > > > >>> > > > >>> Proposal 3: Impala-lzo > > > >>> > > > >>> Drop support for Impala-lzo/hadoop-lzo > > > >>> > > > >>> Proposal 4: Clients > > > >>> > > > >>> Deprecate beeswax protocol. This means that it can be removed in > the > > next > > > >>> major version number, but it would not be removed in Impala 4. > > Current > > > >>> users of beeswax would need to start migrating to HS2. > > > >>> > > > >>> Proposal 5: Sentry > > > >>> > > > >>> Drop support for Sentry in favor of Ranger. > > > >>> > > > >>> Proposal 6: Metadata > > > >>> > > > >>> Metadata V2 will become the default. Metadata V1 will be > deprecated. > > > >>> > > > >>> Thanks, > > > >>> Joe > > > >>> > > > >> > > >