Jianfeng Jia has posted comments on this change.

Change subject: ASTERIXDB-1566: Fixed External Hash Group By to conform to the 
memory budget
......................................................................


Patch Set 5:

(14 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml
File asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml:

Line 71:     <value>64KB</value>
should the user care about the internal hashtable memory limit?
I thought it could be automatically changed.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java
File 
asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java:

Line 44:     private static final long 
COMPILER_EXTERNAL_GROUP_TABLE_SIZE_DEFAULT = 16 << 20; // 16MB
just want to clarify, is this the cardinality value? or the physical bytes?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-docker/docker/asterix-configuration.xml
File asterixdb/asterix-docker/docker/asterix-configuration.xml:

Line 239:     <name>compiler.groupmemory</name>
same as Chen's thought, I think the `groupbymemory` would be more meaningful.


Line 250:       In an external hash group-by operation, a hash table is used to 
insert/traverse the group-by value
If I were the user, I would not read a description like this long :-)


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java:

Line 30:     // Return the number of entire frames that are allocated to this 
buffer manager.
use JavaDoc for Interface?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java:

Line 52:         // If we couldn't find an already spilled partition, or it is 
too small to flush that one,
personal opition: a large chunk of comments inside the code breaks the code 
readability. It's will be very hard for readers to navigate the actual code. 

It's better to move these two chunks on top of the function as a function 
description.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java:

Line 50:     // total number of frames that are allocated
code tells the comment.


Line 175:         } else {
don't need `else`?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 68:         int expectedByteSizeOfHashTableForGroupBy = 
SerializableHashTable.getUnitSize()
should it be convenient to have a corresponding function inside the 
`SerializableHashTable`?


Line 179:                 if (isUsedNumFramesExceedBudget()) {
it's a ideal time to check :-)


Line 254:                 if (getNumFrames() <= framesLimit) {
`return getNumFrames() > framesLimit` ?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java:

Line 64:                         // If the result type is 
SUCCESS_BUT_EXCEEDS_BUDGET,
Again, the large chunk of comment breaks the code flow. It's better to merge 
them into as one paragraph and add it on top the function.


Line 74:                         // FAIL case - try to insert again
code tells.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java
File 
hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java:

Line 179:         // With the above table size, there can be 1001 * 40 / 256 (= 
156) number of header frames in Hash Table.
this is the function description.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Wenhai Li <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-HasComments: Yes

Reply via email to