[ 
https://issues.apache.org/jira/browse/ORC-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated ORC-144:
------------------------------
    Affects Version/s: 1.4.0

> PATCHED BASE Documentation Issues
> ---------------------------------
>
>                 Key: ORC-144
>                 URL: https://issues.apache.org/jira/browse/ORC-144
>             Project: Orc
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 1.4.0
>            Reporter: Douglas Drinka
>            Assignee: Douglas Drinka
>            Priority: Minor
>
> The documentation for Patched Base encoding has two issues.
> First is a repeat of "Data values (W * L bits padded to the byte)..." in the 
> data field description.
> Second is in the example given.  The sample data for all the other encoding 
> formats actually trigger their encoder based on the logic in the java code.  
> However this example sequence is too short to trigger both the 90% cutoff for 
> non-rebased data (1.0-.9)*10 = 0.99999999999999978 which floors to 0, and the 
> 95% cutoff of rebased data.  At least 20 values are needed for a single patch 
> to occur.
> I propose the following sequence:
> [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070, 2080, 2090, 2100, 2110, 
> 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
> Which encodes to [0x8e, 0x13, 0x2b, 0x21, 0x07, 0xd0, 0x1e, 0x00, 0x14, 0x70, 
> 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e, 0x78, 0x82, 0x8c, 0x96, 0xa0, 
> 0xaa, 0xb4, 0xbe, 0xfc, 0xe8]
> Then in the description the wording should be "a length of 20 (19)".
> These samples were critical for me to verify my code, and I appreciated them 
> being provided, particularly since I didn't find any unit tests available in 
> the java code to directly compare byte outputs of the encoders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to