Rohini Palaniswamy created PIG-4656:
---------------------------------------

             Summary: Improve serialization and comparator performance in 
BinInterSedes
                 Key: PIG-4656
                 URL: https://issues.apache.org/jira/browse/PIG-4656
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.16.0


Two major optimizations can be done:
  -  PIG-1472 added multiple data types to store different sizes (byte, short, 
int). It can be simplified using WritableUtils.writeVInt. There is no 
difference for byte and short compared to current approach. But with int, it 
could be beneficial where lot of numbers could be written with 3 bytes instead 
of 4. For eg: 32768 is written using 3 bytes in with WritableUtils.writeVInt 
whereas currently 4 bytes (int) is used. 
  -  String comparison in BinInterSedesTupleRawComparator initializes String 
for comparison. Should instead compare bytes like Text.Comparator.
{code}
str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to