[
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Kommireddi updated PIG-3110:
-------------------------------------
Attachment: PIG-3110.patch
Patch contains changes to Utf8StorageConverter and TestConversions. In addition
to making the above discussed changes I have also added an additional check
{code}
if(b == null || b.length == 0) {
return null;
}
{code}
We don't need to parse further if the input byte array is empty, thereby
avoiding expensive valueOf(String s) calls.
Also, this could further be optimized if the only reason now for falling back
on Double.valueOf() is to handle floating points. The current process for
bytesToLong and bytesToInteger in case of floating point numbers is:
1. Integer/Long.valueOf(String)
2. If 1 results in null, call Double.valueOf
3. Convert result of 2 back to Integer/Long.
Input bytearray can be determined to be a floating point thereby avoiding call
1.
Last thing, the above process takes place regardless of whether input byte
array is numeric or not. This is unnecessary in case of strings like
"1234abcd".
If all agree, we should open another JIRA and optimize these methods further.
> pig corrupts chararrays with trailing whitespace when converting them to long
> -----------------------------------------------------------------------------
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
> Issue Type: Bug
> Components: data
> Affects Versions: 0.10.0
> Reporter: Ido Hadanny
> Attachments: PIG-3110.patch
>
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216) <--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279) <--- correct
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira