Rohini Palaniswamy created PIG-4656:
---------------------------------------
Summary: Improve serialization and comparator performance in
BinInterSedes
Key: PIG-4656
URL: https://issues.apache.org/jira/browse/PIG-4656
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
Fix For: 0.16.0
Two major optimizations can be done:
- PIG-1472 added multiple data types to store different sizes (byte, short,
int). It can be simplified using WritableUtils.writeVInt. There is no
difference for byte and short compared to current approach. But with int, it
could be beneficial where lot of numbers could be written with 3 bytes instead
of 4. For eg: 32768 is written using 3 bytes in with WritableUtils.writeVInt
whereas currently 4 bytes (int) is used.
- String comparison in BinInterSedesTupleRawComparator initializes String
for comparison. Should instead compare bytes like Text.Comparator.
{code}
str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)