[
https://issues.apache.org/jira/browse/HIVE-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Raducanu updated HIVE-12537:
-----------------------------------
Description:
Perhaps I'm doing something wrong or is actually working as expected.
Putting 1 million constant int32 values produces an ORC file of 1MB.
Surprisingly, 1 million consecutive ints produces a much smaller file.
Code and FileDump attached.
was:
Putting 1 million constant int32 values produces an ORC file of 1MB.
Perhaps I'm doing something wrong or is actually working as expected.
Will attach code.
Output from FileDump:
Rows: 1000000
Compression: NONE
Type: int
Stripe Statistics:
Stripe 1:
Column 0: count: 1000000 hasNull: false min: 123 max: 123 sum: 123000000
File Statistics:
Column 0: count: 1000000 hasNull: false min: 123 max: 123 sum: 123000000
Stripes:
Stripe: offset: 3 data: 1003847 rows: 1000000 tail: 41 index: 2871
Stream: column 0 section ROW_INDEX start: 3 length 2871
Stream: column 0 section DATA start: 2874 length 1003847
Encoding column 0: DIRECT_V2
File length: 1006860 bytes
Padding length: 0 bytes
Padding ratio: 0%
> RLEv2 doesn't seem to work
> --------------------------
>
> Key: HIVE-12537
> URL: https://issues.apache.org/jira/browse/HIVE-12537
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 1.2.1
> Reporter: Bogdan Raducanu
> Labels: orc, orcfile
> Attachments: Main.java, orcdump.txt
>
>
> Perhaps I'm doing something wrong or is actually working as expected.
> Putting 1 million constant int32 values produces an ORC file of 1MB.
> Surprisingly, 1 million consecutive ints produces a much smaller file.
> Code and FileDump attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)