Here's another one regarding support of ordinals in HAVING clauses. https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L54
https://issues.apache.org/jira/browse/IMPALA-7844 -Shant On Fri, May 8, 2020 at 10:35 AM Sahil Takiar <takiar.sa...@gmail.com> wrote: > Another aspect is that ACID-inserts are probably faster, especially on > object stores like S3. > > > Note that > > https://impala.apache.org/docs/build/html/topics/impala_s3_skip_insert_staging.html > allows > for direct-writes to S3 (no staging directory). Although this does not work > for insert overwrite queries. > > On Fri, May 8, 2020 at 1:44 AM Zoltán Borók-Nagy <borokna...@apache.org> > wrote: > > > About transactional tables: > > If there's an ACID base directory in the table (due to compaction or > INSERT > > OVERWRITE), then files at table/partition-root level will be ignored. > > So in that case Spark would need to do ACID-aware inserts. > > > > Another aspect is that ACID-inserts are probably faster, especially on > > object stores like S3. > > The reason for this is that we don't need to create a staging directory > and > > move (which is a copy on S3) files to their final location. > > However, read amplification is definitely greater for ACID tables. > > > > Btw, do we want to achieve consistent default behavior with an upstream > > Hive version? > > > > That said, I think creating non-transactional tables is a good default. > > Especially because Impala will probably support Hudi and Iceberg in the > > future, so it's probably better to let the users choose explicitly. > > > > - Zoltan > > > > > > On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <tarmstr...@cloudera.com> > > wrote: > > > > > That's a pretty good argument against defaulting to transactional > tables. > > > You are right that it doesn't work out-of-the box with most other > > engines - > > > writing files into the base directory of the table/partition will not > > work > > > as intended afaik. > > > > > > On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com> > > wrote: > > > > > > > How compatible with other engines is the insert only transaction > type. > > > > > > > > Very often data is loaded with spark, especially for cases with > complex > > > > types where it's the only option. Will landing parquet files in the > > table > > > > path just work even if we don't get consistent inserts or does spark > > need > > > > to be aware of the table format in either case? > > > > > > > > -Shant > > > > > > > > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <takiar.sa...@gmail.com> > > > > wrote: > > > > > > > > > +1 on query results spooling, I've been thinking about enabling it > by > > > > > default recently since it seems to be relatively stable. > > > > > > > > > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong < > > tarmstr...@cloudera.com > > > > > > > > > wrote: > > > > > > > > > > > I'm going to revive this thread. I thought of a few more defaults > > > that > > > > we > > > > > > might want to change. These are default changes we (putting on > > > Cloudera > > > > > hat > > > > > > temporarily) have made for some new production deployments and > have > > > > been > > > > > > happy with. > > > > > > > > > > > > Query result spooling has a bunch of advantages for resource > > > > consumption > > > > > > and fetch speed. It uses a bounded amount of memory and scratch > > > space, > > > > > but > > > > > > I think it's overall a better default. We've been using it in > > > > production > > > > > > for a while now and haven't had any issues. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html > > > > > > > > > > > > I think we should also switch the default file format to parquet, > > > > because > > > > > > it's more correct (default text has some issues with escaping) > and > > > > > because > > > > > > it's more performant. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html > > > > > > > > > > > > We could also consider creating insert_only transactional tables > by > > > > > default > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > > > > https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html > > > > > > . > > > > > > The pros and cons here are more complex - we get more consistent > > > > > behaviour > > > > > > by default, but there can be perf/scalability consequences. > > > > > > > > > > > > Any objections or thoughts on these? > > > > > > > > > > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong < > > > tarmstr...@cloudera.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > I think ARM support can ship in whatever release it's reading > in, > > > > since > > > > > > > it's not a breaking change. > > > > > > > > > > > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zhaoren...@hotmail.com> > > > wrote: > > > > > > > > > > > > > >> Thanks > > > > > > >> I will work hard on this ^_^ > > > > > > >> > > > > > > >> ________________________________ > > > > > > >> 发件人: Jim Apple <apa...@jbapple.com> > > > > > > >> 发送时间: 2020年3月19日 10:21 > > > > > > >> 收件人: dev@impala.apache.org <dev@impala.apache.org> > > > > > > >> 主题: Re: Impala 4.0 breaking changes > > > > > > >> > > > > > > >> I agree. I don’t know how far we are from having arm64 > support, > > > > > though, > > > > > > >> and > > > > > > >> we might not get there for a 4.0 release, I’d guess. But that > > > > doesn’t > > > > > > mean > > > > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or > > whatever. > > > > > > >> > > > > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell < > > > > > > joemcdonn...@cloudera.com> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > Patches to add support for arm64 are definitely welcome in > any > > > > > > release. > > > > > > >> > > > > > > > >> > Thanks, > > > > > > >> > Joe > > > > > > >> > > > > > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 < > zhaoren...@hotmail.com> > > > > > wrote: > > > > > > >> > > > > > > > >> > > Hi > > > > > > >> > > > > > > > > >> > > Could we add support for arm64? > > > > > > >> > > > > > > > > >> > > Thanks > > > > > > >> > > Zhao Renhai > > > > > > >> > > > > > > > > >> > > ________________________________ > > > > > > >> > > 发件人: Joe McDonnell <joemcdonn...@cloudera.com> > > > > > > >> > > 发送时间: 2020年3月17日 1:07 > > > > > > >> > > 收件人: dev@impala.apache.org <dev@impala.apache.org> > > > > > > >> > > 主题: Impala 4.0 breaking changes > > > > > > >> > > > > > > > > >> > > Now that Impala 3.4 is branched and master is Impala 4.0, > we > > > > need > > > > > to > > > > > > >> > decide > > > > > > >> > > what breaking changes will happen in Impala 4.0. I have > > > > provided a > > > > > > >> series > > > > > > >> > > of proposals below. I welcome feedback on them. Other > > > proposals > > > > > are > > > > > > >> also > > > > > > >> > > welcome. > > > > > > >> > > > > > > > > >> > > Thanks, > > > > > > >> > > Joe > > > > > > >> > > > > > > > > >> > > Proposal 0: Hadoop component versions > > > > > > >> > > > > > > > > >> > > Switch to CDP versions of components by default. This > means > > > that > > > > > > >> Impala > > > > > > >> > > will use Hive 3+ (which is already essentially Hive 4 and > > may > > > > > change > > > > > > >> > names > > > > > > >> > > to being Hive 4). > > > > > > >> > > Remove support for CDH versions of components. > > > > > > >> > > This was already discussed in the original thread for > Impala > > > 4, > > > > so > > > > > > >> this > > > > > > >> > is > > > > > > >> > > not new. > > > > > > >> > > > > > > > > >> > > Proposal 1: OS support > > > > > > >> > > > > > > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all > > > versions) > > > > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and > SLES > > 12 > > > > > > >> > > Centos 7 development will be focused on newer Centos 7 > > > versions > > > > > such > > > > > > >> as > > > > > > >> > 7.6 > > > > > > >> > > and 7.7. > > > > > > >> > > Add support for Centos 8 > > > > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over > time. > > > > > > >> > > > > > > > > >> > > Proposal 2: Python support > > > > > > >> > > > > > > > > >> > > Drop support for Python 2.6 > > > > > > >> > > Add support for Python 3 over time. > > > > > > >> > > > > > > > > >> > > Proposal 3: Impala-lzo > > > > > > >> > > > > > > > > >> > > Drop support for Impala-lzo/hadoop-lzo > > > > > > >> > > > > > > > > >> > > Proposal 4: Clients > > > > > > >> > > > > > > > > >> > > Deprecate beeswax protocol. This means that it can be > > removed > > > in > > > > > the > > > > > > >> next > > > > > > >> > > major version number, but it would not be removed in > Impala > > 4. > > > > > > Current > > > > > > >> > > users of beeswax would need to start migrating to HS2. > > > > > > >> > > > > > > > > >> > > Proposal 5: Sentry > > > > > > >> > > > > > > > > >> > > Drop support for Sentry in favor of Ranger. > > > > > > >> > > > > > > > > >> > > Proposal 6: Metadata > > > > > > >> > > > > > > > > >> > > Metadata V2 will become the default. Metadata V1 will be > > > > > deprecated. > > > > > > >> > > > > > > > > >> > > Thanks, > > > > > > >> > > Joe > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sahil Takiar > > > > > Software Engineer > > > > > takiar.sa...@gmail.com | (510) 673-0309 > > > > > > > > > > > > > > > > > -- > Sahil Takiar > Software Engineer > takiar.sa...@gmail.com | (510) 673-0309 >