I'm also in strong agreement here.
Codegen is a logical next step on my mind. There are multiple inherent
benefits, ranging from vectorised processing to runtime GPU offload
support. I think data locality and PXF performance are important although
in pure cloud deployments, compute is, above all, what we influence most of
all. Not to mention, the Greenplum team is showing good potential with
codegen; we should incorporate that work, in any way possible, with HAWQ.
On Mon, Oct 17, 2016, 20:26 Hong Wu <xunzhang...@gmail.com> wrote:
> Strong +1 on this.
> Performance is one of the reasons why our customers choose HAWQ, the
> existing leading performance might come from C implementation and Postgres
> implementation I think. Hawq will definitely focus on some performance
> improvement but frankly speaking plan/roadmap should be shaped and
> discussed in detail like this thread. Below are some of our
> beforehand consideration and to-do list:
> - Codegen tech to optimize executor efficiency.
> - Data-skipping tech to optimize I/O performance.
> - Optimize external table access, especially PXF.
> - Some vectorized refactor.
> - Optimize data locality.
> - Optimize distributed resource organization and management.
> - Optimize communication module of interconnect.
> - Gpus, SSDs
> - ...
> We are running performance tests in several cluster environment for HAWQ
> every week and continue paying attention to latest performance update from
> our competitor and research paper. But we need some more guys joining us to
> be focused on performance feature. We are very very welcome that some
> developers from HAWQ open-source community to be a member of us in
> performance part.
> 2016-10-18 5:11 GMT+08:00 Michael Pearce <michael.pea...@ig.com>:
> > Hi All,
> > As now HAWQ is being caught up with by some competitors in terms of real
> > use performance, and in some cases be out performed, most notably Spark
> > some queries we can perform faster since project tungsten.
> > Obviously HAWQ still has the SQL completeness advantage but this also is
> > slowly changing space, where Spark and others are improving.
> > Is there any plans to start looking improving the execution performance
> > HAWQ further with parquet vectorisation and whole stage codegen?
> > http://www.slideshare.net/databricks/spark-performance-whats-next
> > http://blog.2ndquadrant.com/postgresql-10-roadmap/
> > On the note of the postgres 10 roadmap. Is there any plans of updating
> > compatibility / the fork of postgres to later versions (back merging),
> > afaik HAWQ is a fork of 8.x which is quite dated.
> > Im sure already all of these questions are answered/discussed, but it be
> > great to get some visibility into the roadmap for these areas for HAWQ.
> > Cheers
> > Mike
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> > email and any copies of it. Opinions, conclusion (etc) that do not relate
> > to the official business of this company shall be understood as neither
> > given nor endorsed by it. IG is a trading name of IG Markets Limited (a
> > company registered in England and Wales, company number 04008957) and IG
> > Index Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> > Financial Conduct Authority.
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kd...@pivotal.io