[ https://issues.apache.org/jira/browse/TEZ-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated TEZ-972: ------------------------------- Resolution: Fixed Fix Version/s: 0.4.0 Status: Resolved (was: Patch Available) Committed to master. > Shuffle Phase - optimize memory usage of empty partition data in > DataMovementEvent > ---------------------------------------------------------------------------------- > > Key: TEZ-972 > URL: https://issues.apache.org/jira/browse/TEZ-972 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Fix For: 0.4.0 > > Attachments: TEZ-972-v1.patch, TEZ-972-v2.patch, TEZ-972-v3.patch, > TEZ-972.4.txt > > > Empty partition details are stored in byte[] in compressed format and sent > via DataMovementEvent in shuffle phase. Quick standalone tests reveals that > BitSet would be more efficient than compressing the byte[]. > PartitionSize=1 , BitSetSize=1 , CompressedBitSetSize=9 , > NormalByteArrayCompressed=9 > PartitionSize=101 , BitSetSize=13 , CompressedBitSetSize=22 , > NormalByteArrayCompressed=42 > PartitionSize=201 , BitSetSize=26 , CompressedBitSetSize=37 , > NormalByteArrayCompressed=62 > PartitionSize=301 , BitSetSize=38 , CompressedBitSetSize=49 , > NormalByteArrayCompressed=76 > .. > PartitionSize=1001 , BitSetSize=126 , CompressedBitSetSize=137 , > NormalByteArrayCompressed=197 > .. > PartitionSize=2001 , BitSetSize=251 , CompressedBitSetSize=262 , > NormalByteArrayCompressed=374 > PartitionSize=4001 , BitSetSize=501 , CompressedBitSetSize=512 , > NormalByteArrayCompressed=686 > PartitionSize=8001 , BitSetSize=1001 , CompressedBitSetSize=1012 , > NormalByteArrayCompressed=1330 > PartitionSize=16001 , BitSetSize=2001 , CompressedBitSetSize=1979 , > NormalByteArrayCompressed=2569 > PartitionSize=32001 , BitSetSize=4001 , CompressedBitSetSize=3885 , > NormalByteArrayCompressed=5000 > -This is based on considering random bit positions as empty partitions. > It is not possible to directly use JDK 1.6's BitSet directly as it does not > support valueOf, toByteArray() functions. Suggestion is to have Tez specific > BitSet (until Tez moves to JDK 1.7) and make the compression as a job > configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)