Re: List of potential work to do on Quickstep

Harshad Deshmukh Tue, 23 Aug 2016 11:33:12 -0700

Hi Jignesh,

Thanks for sending the list. I want to share an update on point 1.

At present I am working on partitioned aggregation, which builds on topof QUICKSTEP-28 and QUICKSTEP-29 JIRA issues. As the first step in thisgoal, I have created QUICKSTEP-43 JIRA issue (and a corresponding GitHubPR), in which we create a new operator to destruct the Aggregation state(similar to the destroy hash table operator). This operator will beuseful when finalize step in aggregation is parallel and thus the sharedstate can only be destructed once the finalize phase is complete.


On 08/22/2016 08:23 PM, J Patel wrote:

Hi folks,

Here is a list of features that would be good for the community to work on.
Feel free to add or comment on this list.

1: Improve handling of aggregation: Aggregate handling in Quickstep is slow
as a separate hash table is being built for each aggregate. PR
https://github.com/apache/incubator-quickstep/pull/90 is a step in fixing
this, but there is more to be done, including increasing the space
efficiency of the hash table, improving the finalize operation (which is
single-threaded), and considering partitioning (so that finalize can be
parallelized).

2: The use of ColumnVectors is very expensive as it involves a full extra
read and write of data, and results in a bad memory access pattern. That
design needs to be rethought/refactored. Nav has suggested using an
iterator model v/s accessors and that is a good idea. We can probably go
beyond that and think of defining patterns for taking an input, applying a
predicate, and applying a projection (copy). Any ideas here are welcome.

3: We have bloomfilters and that needs to be optimized to work with joins.
Jianqiao is working on this.

4: Error handling in the system can be improved. Here we need to consider
if we want to use error return codes or C++ throw/catch mechanism. Right
now we use a mix of both. I am starting to turn in favor of throw/catch as
that way we at least have a way of catching the error at the top (rather
than crashing). We can then refactor the code to add entire throw/catch
chains. Right now the most serious error handling that is lacking, IMHO, is
when we are loading a large file and there is a corrupted tuple near the
end. The system crashes after making the user wait, and there is no
cleanup.

5: Our type system also needs a major surgery to make it easier to add new
types. Clean UDFs support is also missing.

Other thoughts?

Cheers,
Jignesh


--
Thanks,
Harshad

Re: List of potential work to do on Quickstep

Reply via email to