Yingyi Bu has posted comments on this change. Change subject: Impelemented the memory-bounded HashGroupby and HashJoin for BigObject ......................................................................
Patch Set 7: (10 comments) Here are some high-level comments: 1. What is the current abstraction for user code, e.g., operators and functions? I prefer to have IFrame be the only abstraction that user code need to deal with. It seems to me that we should wrap a byte buffer inside IFrame and IFrame has the ByteBuffer interface but hides the details of re-sizing. Is it do-able? 2. It looks to me switching ExternalGroupByOperatorDescriptor to use an object based hash table implementation will lead to performance regressions. But maybe I have a wrong understanding of the code. Detailed comments are inlined. https://asterix-gerrit.ics.uci.edu/#/c/398/7/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java File algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java: Line 23: import java.util.List; Please use the AsterixDB codestyle as described in https://asterixdb.incubator.apache.org/dev-setup.html. I guess both Till and Ian know how to set the style up in IntelligentJ:-) Line 256: true); Do we always need to sort the output? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/comm/FixedSizeFrame.java File hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/comm/FixedSizeFrame.java: Line 43: public ByteBuffer getBuffer() { What is the user-level abstraction after this change? Will an operator implementor still see ByteBuffer or s/he should only see IFrame? My naive guess is that a user doesn't need to know ByteBuffer any more, but that seems not true? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAppenderWrapper.java File hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAppenderWrapper.java: Line 45: public IFrameTupleAppender getFrameTupleAppender() { This method is never called by others. Should we remove it? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AbstractTupleBufferAccessor.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AbstractTupleBufferAccessor.java: Line 132: throw new IllegalAccessError("Should never call this reset"); Again, about the abstraction, should the user-code, e.g., operators and functions see ByteBuffer at all? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/DeletableFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/DeletableFramePool.java: Line 54: public ByteBuffer allocateFrame(int frameSize) throws HyracksDataException { Should this return IFrame, if IFrame is the abstraction that hides all the details regarding to dynamic-resizing of ByteBuffers. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/FrameFreeSlotLastFit.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/FrameFreeSlotLastFit.java: Line 55: for (int i = size - 1; i >= 0; i--) { Can that be a binary search, or based on a SortedMap? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IDeletableFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IDeletableFramePool.java: Line 26: void deAllocateBuffer(ByteBuffer buffer); Should that be deAllocateFrame(IFrame frame)? I prefer to have only one abstraction for user code, either ByteBuffer or IFrame. https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IFramePool.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IFramePool.java: Line 40: ByteBuffer allocateFrame(int frameSize) throws HyracksDataException; Should this return IFrame? https://asterix-gerrit.ics.uci.edu/#/c/398/7/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/NonSerializableHashTable.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/NonSerializableHashTable.java: Line 24: public class NonSerializableHashTable implements IBatchDeleteTable<TuplePointer> { Why do we switch to this NoSerializable implementation for ExternalGroupBy? It seems to me this class will have object creation problems... -- To view, visit https://asterix-gerrit.ics.uci.edu/398 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I248f3a374fdacad7d57e49cf18d8233745e55460 Gerrit-PatchSet: 7 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Ian2 Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Pouria Pirzadeh <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
