[
https://issues.apache.org/jira/browse/PIG-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542802#comment-14542802
]
Rohini Palaniswamy commented on PIG-4506:
-----------------------------------------
The fix might be easy, but it is bad as it wastes one byte per biginteger and
bigdecimal.
{code}
case DataType.BIGINTEGER:
out.writeByte(DataType.BIGINTEGER);
writeDatum(out, ((BigInteger)val).toByteArray());
break;
case DataType.BIGDECIMAL:
out.writeByte(DataType.BIGDECIMAL);
writeDatum(out, ((BigDecimal)val).toString());
{code}
Instead of actually writing DataType.BIGINTEGER + length of bytearray + byte
array, the code is writing DataType.BIGINTEGER + DATATYPE.BYTEARRAY + length of
bytearray + byte array. In case of BigDecimal it is DataType.BIGDECIMAL +
DataType.CHARARRAY/DataType.BIGCHARARRAY + short/int length of bytearray + byte
array. We should get rid of the DATATYPE.BYTEARRAY and
DataType.CHARARRAY/DataType.BIGCHARARRAY. Though it makes for easy coding it is
inefficient.
Can we extract out the writing and reading of bytearray and its length and
reuse that code instead of calling writeDatum(databytearray). For BigDecimal,
we can always do out.writeShort(length) as the length of the length of the
BigDecimal String should not be > 65535.
> binstorage fails to write biginteger
> ------------------------------------
>
> Key: PIG-4506
> URL: https://issues.apache.org/jira/browse/PIG-4506
> Project: Pig
> Issue Type: Bug
> Components: data, impl
> Reporter: Savvas Savvides
> Assignee: Savvas Savvides
> Fix For: 0.15.0
>
> Attachments: PIG-4506-1.patch
>
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> When trying to store a biginteger using binstorage the following error is
> issued (The error might manifest elsewhere too):
> java.lang.RuntimeException: Unexpected data type -1 found in stream
> This is caused by a bug in the writeDatum method of the DataReaderWriter.java
> class. When writeDatum is called with a BigInteger as a argument, the
> BigInteger is converted to a byte[] and the writeDatum method is recursively
> called on the byte[]. writeDatum cannon handle byte[] objects but instead
> expects DataByteArray objects.
> Suggested fix - wrap byte[] to DataByteArray:
> change this line:
> _writeDatum(out, ((BigInteger)val).toByteArray());_
> to this:
> _writeDatum(out, new DataByteArray(((BigInteger)val).toByteArray()));_
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)