[ 
https://issues.apache.org/jira/browse/TEZ-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-972:
------------------------------------

    Assignee: Rajesh Balamohan

> Shuffle Phase - optimize memory usage of empty partition data in 
> DataMovementEvent
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-972
>                 URL: https://issues.apache.org/jira/browse/TEZ-972
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>
> Empty partition details are stored in byte[] in compressed format and sent 
> via DataMovementEvent in shuffle phase.  Quick standalone tests reveals that 
> BitSet would be more efficient than compressing the byte[].  
> PartitionSize=1 , BitSetSize=1 , CompressedBitSetSize=9 , 
> NormalByteArrayCompressed=9
> PartitionSize=101 , BitSetSize=13 , CompressedBitSetSize=22 , 
> NormalByteArrayCompressed=42
> PartitionSize=201 , BitSetSize=26 , CompressedBitSetSize=37 , 
> NormalByteArrayCompressed=62
> PartitionSize=301 , BitSetSize=38 , CompressedBitSetSize=49 , 
> NormalByteArrayCompressed=76
> ..
> PartitionSize=1001 , BitSetSize=126 , CompressedBitSetSize=137 , 
> NormalByteArrayCompressed=197
> ..
> PartitionSize=2001 , BitSetSize=251 , CompressedBitSetSize=262 , 
> NormalByteArrayCompressed=374
> PartitionSize=4001 , BitSetSize=501 , CompressedBitSetSize=512 , 
> NormalByteArrayCompressed=686
> PartitionSize=8001 , BitSetSize=1001 , CompressedBitSetSize=1012 , 
> NormalByteArrayCompressed=1330
> PartitionSize=16001 , BitSetSize=2001 , CompressedBitSetSize=1979 , 
> NormalByteArrayCompressed=2569
> PartitionSize=32001 , BitSetSize=4001 , CompressedBitSetSize=3885 , 
> NormalByteArrayCompressed=5000
> -This is based on considering random bit positions as empty partitions.
> It is not possible to directly use JDK 1.6's BitSet directly as it does not 
> support valueOf, toByteArray() functions.  Suggestion is to have Tez specific 
> BitSet (until Tez moves to JDK 1.7) and make the compression as a job 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to