List of potential work to do on Quickstep

J Patel Mon, 22 Aug 2016 18:24:06 -0700

Hi folks,

Here is a list of features that would be good for the community to work on.
Feel free to add or comment on this list.


1: Improve handling of aggregation: Aggregate handling in Quickstep is slow
as a separate hash table is being built for each aggregate. PR
https://github.com/apache/incubator-quickstep/pull/90 is a step in fixing
this, but there is more to be done, including increasing the space
efficiency of the hash table, improving the finalize operation (which is
single-threaded), and considering partitioning (so that finalize can be
parallelized).

2: The use of ColumnVectors is very expensive as it involves a full extra
read and write of data, and results in a bad memory access pattern. That
design needs to be rethought/refactored. Nav has suggested using an
iterator model v/s accessors and that is a good idea. We can probably go
beyond that and think of defining patterns for taking an input, applying a
predicate, and applying a projection (copy). Any ideas here are welcome.

3: We have bloomfilters and that needs to be optimized to work with joins.
Jianqiao is working on this.

4: Error handling in the system can be improved. Here we need to consider
if we want to use error return codes or C++ throw/catch mechanism. Right
now we use a mix of both. I am starting to turn in favor of throw/catch as
that way we at least have a way of catching the error at the top (rather
than crashing). We can then refactor the code to add entire throw/catch
chains. Right now the most serious error handling that is lacking, IMHO, is
when we are loading a large file and there is a corrupted tuple near the
end. The system crashes after making the user wait, and there is no
cleanup.

5: Our type system also needs a major surgery to make it easier to add new
types. Clean UDFs support is also missing.

Other thoughts?

Cheers,
Jignesh

List of potential work to do on Quickstep

Reply via email to