Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1566, ASTERIXDB-1628: Fixed External Hash Group By to 
conform to the memory budget
......................................................................


Patch Set 9:

(25 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1056/9/asterixdb/asterix-docker/docker/asterix-configuration.xml
File asterixdb/asterix-docker/docker/asterix-configuration.xml:

Line 251:       more efficiently using pointers. Actual Group-by value is 
stored in data table.
> "Group-by value" -> "aggregate values"
Per our discussion, this parameter will not be used.


Line 253:       the same as the given value divided by 40 bytes. (E.g., 16MB / 
40 = 419,430).
> Can you add the explanation of why 40 is the number into the comments?
Per our discussion, this parameter will not be used.


https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 91:         final ITuplePartitionComputer tpc1 = new 
FieldHashPartitionComputerFamily(intermediateResultKeys,
> better variable name. e.g.,
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/ISpillableTable.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/ISpillableTable.java:

Line 55:      * @return
> annotate @return.
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java:

Line 79:                         result = table.insert(accessor, i);
> flip the else branch as the if branch, and hence not nest more than 4 level
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/ISerializableTable.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/ISerializableTable.java:

Line 43:     boolean isGarbageCollectioNeeded();
> This method is never called. Do we really need it?
Actually, this will be called during a partition spill to the disk from the 
caller due to a budget exceed.


Line 46:             throws HyracksDataException;
> Why does this need to be expose to the outside?
Because we do a garbage collection need check when we spill a data partition to 
the disk, not after each insertion. So, if the check says the garbage 
collection is required, we need to have a way to execute the garbage collection.


https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SerializableHashTable.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SerializableHashTable.java:

Line 47:     private static int INVALID_VALUE = -1;
> 1. static final
Done


Line 262:                     
lastContentFrame.writeInt(lastOffsetInCurrentFrame + i, INVALID_VALUE);
> Simplify that using Arrays.fill(...).
Done


Line 384:         for (int i = 0; i < frameCapacity; i++)
> Simplify that using one line:
Done


Line 413:         int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / 
frameSize);
> MAJOR SonarQube violation:
Done


Line 413:         int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / 
frameSize);
> CRITICAL SonarQube violation:
Done


Line 413:         int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / 
frameSize);
> Pls address this.
Done


Line 414:         int numberOfContentFrame = (int) Math
> Pls address this.
Done


Line 414:         int numberOfContentFrame = (int) Math
> MAJOR SonarQube violation:
Done


Line 415:                 .ceil((getNumberOfEntryInSlot() * 2 * getUnitSize() * 
tableSize) / frameSize);
> CRITICAL SonarQube violation:
Done


Line 439:         // For calculating the original hash value
> wastedIntSpaceCount is not modified after executeGarbageCollection?
Done


Line 440:         TuplePointer tmpTuplePointerForGC = new TuplePointer();
> reuse the object.
Done


Line 445:         int slotCapacity = 0;
> MAJOR SonarQube violation:
Done


Line 446:         int slotUsedCount = 0;
> MAJOR SonarQube violation:
Done


Line 447:         int capacityInIntCount = 0;
> MAJOR SonarQube violation:
Done


Line 462:             slotCapacity = INVALID_VALUE;
> MAJOR SonarQube violation:
Done


Line 463:             slotUsedCount = INVALID_VALUE;
> MAJOR SonarQube violation:
Done


Line 486:                         if ((currentWriteIntOffsetInPageForGC + 4) > 
frameCapacity
> MAJOR SonarQube violation:
Done


Line 488:                             // Swipe the region that can't be used.
> Using Arrays.fill(...) to simplify that.
Done


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 9
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangs...@yahoo.com>
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng....@gmail.com>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wenhai Li <lwhaym...@yahoo.com>
Gerrit-Reviewer: Yingyi Bu <buyin...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to