Jianfeng Jia has posted comments on this change. Change subject: Implemented the memory-bounded HashGroupby and HashJoin for BigObject ......................................................................
Patch Set 18: (11 comments) https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAccessor.java File hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/comm/io/FrameTupleAccessor.java: Line 149: sb.append("FailedToDeserializeField" + j); > "FailedToDeserializeField" -> "Failed to deserialize field " Done https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AtMostOneFrameForSpilledPartitionPolicy.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/AtMostOneFrameForSpilledPartitionPolicy.java: Line 24: public class AtMostOneFrameForSpilledPartitionPolicy { > It's a bit difficult for me to understand this class name. changed to `PreferToSpillFullyOccupiedFramePolicy` Line 45: // try to flush the in memory partition. > "the" --> "an" Done Line 65: } > It looks the code in findInMemPartitionWithMaxMemoryUsage and findSpilledPa Good suggestion! Done. https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java: Line 93: * Insert the tuple from the tupleAccessor into the already spilled partition. The spilled ones can occupy at most one frame. > What does the last sentence mean here? The already spilled partition can not compete with the in-memory ones for more frames. It can only use one frame to buffer the data. It also could be zero. Line 132: * Clear the particular partition. > "Clear" --> "Clear the memory occupation of " Done https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java: Line 104: partitionArray[partitionId].reset(); > Should we reserve one empty flushed page in partitionArray[partitionId].res Since we guarantee the spilled partition can't have more than one frame, the thrashing won't exist anymore. There still might be the chance to spill the half-full frame when the partition (pid) didn't have any data. In that case, the in-memory part will be spilled. Since the spilled one can't never get the more than one frame back, the chance of having this kind half-ful spilling is < |partition| https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java: Line 1: /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file > license header changed Done https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupWriteOperatorNodePushable.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupWriteOperatorNodePushable.java: Line 96: writer.fail(); > In case there is a failure in its data consumer, we should still cleanup le Done https://asterix-gerrit.ics.uci.edu/#/c/398/18/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java File hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java: Line 101: private TuplePointer tempPtr = new TuplePointer(); // this is a reusable object to store the pointer,which is not used anywhere, > WS Done Line 241: private void flushAndClearSpilledPartition(SIDE whichSide) throws HyracksDataException { > "flushAndClearSpilledPartition"-->"closeAllSpilledPartitions"? Done -- To view, visit https://asterix-gerrit.ics.uci.edu/398 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I248f3a374fdacad7d57e49cf18d8233745e55460 Gerrit-PatchSet: 18 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Pouria Pirzadeh <[email protected]> Gerrit-Reviewer: Preston Carman <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
