[
https://issues.apache.org/jira/browse/HIVE-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Damien Carol updated HIVE-12537:
--------------------------------
Description:
Perhaps I'm doing something wrong or is actually working as expected.
Putting 1 million constant int32 values produces an ORC file of 1MB.
Surprisingly, 1 million consecutive ints produces a much smaller file.
Code and FileDump attached.
{code}
ObjectInspector inspector =
ObjectInspectorFactory.getReflectionObjectInspector(
Integer.class,
ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
Writer w = OrcFile.createWriter(new Path("/tmp/my.orc"),
OrcFile.writerOptions(new Configuration())
.compress(CompressionKind.NONE)
.inspector(inspector)
.encodingStrategy(OrcFile.EncodingStrategy.COMPRESSION)
.version(OrcFile.Version.V_0_12)
);
for (int i = 0; i < 1000000; ++i) {
w.addRow(123);
}
w.close();
{code}
was:
Perhaps I'm doing something wrong or is actually working as expected.
Putting 1 million constant int32 values produces an ORC file of 1MB.
Surprisingly, 1 million consecutive ints produces a much smaller file.
Code and FileDump attached.
> RLEv2 doesn't seem to work
> --------------------------
>
> Key: HIVE-12537
> URL: https://issues.apache.org/jira/browse/HIVE-12537
> Project: Hive
> Issue Type: Bug
> Components: File Formats, ORC
> Affects Versions: 1.2.1
> Reporter: Bogdan Raducanu
> Labels: orc, orcfile
> Attachments: Main.java, orcdump.txt
>
>
> Perhaps I'm doing something wrong or is actually working as expected.
> Putting 1 million constant int32 values produces an ORC file of 1MB.
> Surprisingly, 1 million consecutive ints produces a much smaller file.
> Code and FileDump attached.
> {code}
> ObjectInspector inspector =
> ObjectInspectorFactory.getReflectionObjectInspector(
> Integer.class,
> ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
> Writer w = OrcFile.createWriter(new Path("/tmp/my.orc"),
> OrcFile.writerOptions(new Configuration())
> .compress(CompressionKind.NONE)
> .inspector(inspector)
>
> .encodingStrategy(OrcFile.EncodingStrategy.COMPRESSION)
> .version(OrcFile.Version.V_0_12)
> );
>
> for (int i = 0; i < 1000000; ++i) {
> w.addRow(123);
> }
> w.close();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)