Yingyi Bu has posted comments on this change.

Change subject: RangeGenerator aggfunc for the numeric/asciiString datatype 
based on parallel streaming histogram.
......................................................................


Patch Set 35:

(18 comments)

Here are some high-level comments:

1. Let partition number be the second argument of the range generation function;

2. Use JDK library as much as possible, e.g., binary search and priority queue;

3. Move histogram generation code from Hyracks to Asterix and remove type enums 
in the histogram processing code.

Detailed comments are inlined.

https://asterix-gerrit.ics.uci.edu/#/c/806/35/asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg_double/rg_double.3.query.aql
File 
asterixdb/asterix-app/src/test/resources/runtimets/queries/aggregate/rg_double/rg_double.3.query.aql:

Line 20: set partitions '2'
can "partitions" be an argument of the function?


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IHeapList.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IHeapList.java:

Line 28:     public void maintainHeap() throws HyracksDataException;
revisit the possibility of using priority queue and not exposing xxxHeap as 
public methods,  since xxxHeap seems to be implementation details rather than 
abstractions.


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IHistogram.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IHistogram.java:

Line 38:         SHORT,
Those types are not AsterixDB types.  
Can we move all histogram code into asterixdb-runtime codebase and only handle 
asterixdb types.


Line 52:     public void appendPair(E item, int count, boolean soft) throws 
HyracksDataException;
What's the difference between addPair and appendPair?  Do we need both at 
abstraction level?

What does soft mean?  Is it sth. necessary at the abstraction level?


Line 54:     public List<Entry<E, Integer>> generate(boolean isGlobal) throws 
HyracksDataException;
Must there be an final step to call generate? Can generate be called multiple 
times?  If it can be called multiple times, can we use "get" instead of 
"generate"?


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IQuantileList.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/IQuantileList.java:

Line 29:     public GenericQuantile<K, V> get(int index);
Can you add some high-level annotation to describe what each method means?

Are all of those methods needed at the abstraction level?


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/AbstractQuantileList.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/AbstractQuantileList.java:

Line 34:     private List<GenericQuantile<Double, Integer>> bins;
Use a fixed size array?


Line 38:     protected DominantQuantile<Number> heapHead;
Use Java's PriorityQueue?


Line 40:     protected Random prng;
prng is not used in this abstract class?

Move this to subclasses?  It might end up with duplications across different 
subclasses, but readability seems better to me.


Line 46:         prng = new Random(31183);
Why is the random seed a fixed number?
Usually it's sth. like System.nanoTime().


Line 60:                 } else {
Use JDK library binarySearch?


Line 102:     public void add(Double key, Integer value) {
Double->double?


Line 120:     public void add(Double key) throws HyracksDataException {
Double->double?


Line 125:         else if (bins.size() < cardinality) {
is the code formatted correctly? "else if" should be appended to the previous 
line.


Line 175:     public void setKey(int index, Double key) {
Double->double?


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/DominantQuantile.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/DominantQuantile.java:

Line 70:         return super.hashCode();
This doesn't seem right. hashCode implementation should be consistent with 
equals.

E.g.:
a.equals(b) --> a.hashCode() == b.hashCode()


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/GenericQuantile.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/GenericQuantile.java:

Line 72:     public int hashCode() {
hashCode implementation should be consistent with equals().


https://asterix-gerrit.ics.uci.edu/#/c/806/35/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/MinSumCountQuantileList.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/range/structures/MinSumCountQuantileList.java:

Line 78:     public void increase(int index, Double key) throws 
HyracksDataException {
Double -> double?
Why do you always need boxed "Double key"?


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/806
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I450d0962fbeacfb2b6ab9fae0750f025ef17ba01
Gerrit-PatchSet: 35
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Wenhai Li <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Preston Carman <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Wenhai Li <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-HasComments: Yes

Reply via email to