Jianfeng Jia has posted comments on this change. Change subject: Implemented the memory-bounded HashGroupby and HashJoin for BigObject ......................................................................
Patch Set 7: (11 comments) @Yingyi, I'd like to address the IFrame related changes to a separate patch. Otherwise this one will become too big. https://asterix-gerrit.ics.uci.edu/#/c/398/7/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java: Line 256: true); > Do we always need to sort the output? No. We don't need to sort if using ExternalHashGroupby operator. Did you mean that "true"? That "true" is about "isFinalStage". https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/comm/FixedSizeFrame.java File hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/comm/FixedSizeFrame.java: Line 43: public ByteBuffer getBuffer() { > What is the user-level abstraction after this change? I've updated all the IFrame to the IFrameReader. Haven't done it for the IFrameWriter. I think it should be a separate patch, because it will be a big change again. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAppenderWrapper.java File hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAppenderWrapper.java: Line 45: public IFrameTupleAppender getFrameTupleAppender() { > This method is never called by others. Done https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AbstractTupleBufferAccessor.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AbstractTupleBufferAccessor.java: Line 132: throw new IllegalAccessError("Should never call this reset"); > Again, about the abstraction, should the user-code, e.g., operators and fun I will address this as a separate CR. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/DeletableFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/DeletableFramePool.java: Line 54: public ByteBuffer allocateFrame(int frameSize) throws HyracksDataException { > Should this return IFrame, if IFrame is the abstraction that hides all the Will address this issue in another CR. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/FrameFreeSlotLastFit.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/FrameFreeSlotLastFit.java: Line 45: frameSpaces = new FrameSpace[initial_frame_number]; > variable name should be camel case. Done Line 55: for (int i = size - 1; i >= 0; i--) { > Can that be a binary search, or based on a SortedMap? This why it is called "lastFit". BinarySearch is introduced in "bestFit" code. Actually, based on my experiment the lastFit is the best. BinarySearch will introduce more overhead. Especially when we assume most of the case is the normal object case, the size of each Frame is equal, then picking the last one is the most efficient way. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IDeletableFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IDeletableFramePool.java: Line 26: void deAllocateBuffer(ByteBuffer buffer); > Should that be Will cover IFrame related changes in a separate patch. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IFramePool.java: Line 40: ByteBuffer allocateFrame(int frameSize) throws HyracksDataException; > Should this return IFrame? Will cover the IFrame related change in a separate patch https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java: Line 61: void flushPartition(int pid, IFrameWriter writer) throws HyracksDataException; > Please add a comment here about what this function does. How does it differ Done https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/NonSerializableHashTable.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/NonSerializableHashTable.java: Line 24: public class NonSerializableHashTable implements IBatchDeleteTable<TuplePointer> { > Why do we switch to this NoSerializable implementation for ExternalGroupBy? removed -- To view, visit https://asterix-gerrit.ics.uci.edu/398 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I248f3a374fdacad7d57e49cf18d8233745e55460 Gerrit-PatchSet: 7 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Pouria Pirzadeh <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
