[
https://issues.apache.org/jira/browse/ORC-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680227#comment-16680227
]
ASF GitHub Bot commented on ORC-363:
------------------------------------
wgtmac commented on issue #306: ORC-363: Enable zstd for java writer/reader
URL: https://github.com/apache/orc/pull/306#issuecomment-437125329
I have successfully built hadoop v2.9.1 with zstd native support in an
Ubuntu 18.10 docker image and use the java ORC tool to read the ORC file
compressed in ZSTD. The result is as below:
> root@80d6bc18fa17:~/work/orc/java# java
-Djava.library.path=$HADOOP_HOME/lib/native -cp
$ORC_CLASSPATH:$HADOOP_CLASSPATH:$M2_HOME/io/airlift/aircompressor/0.10/aircompressor-0.10.jar
org.apache.orc.tools.Driver meta /tmp/orc/2.orc
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
[jar:file:/root/work/hadoop/hadoop-dist/target/hadoop-2.9.1/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
[jar:file:/root/work/hadoop/hadoop-dist/target/hadoop-2.9.1/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
[jar:file:/root/work/hadoop/hadoop-dist/target/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Processing data file /tmp/orc/2.orc [length: 159282853]
> 2018-11-08 18:34:25,739 INFO [main] impl.OrcCodecPool
(OrcCodecPool.java:getCodec(56)) - Got brand-new codec ZSTD
> Structure for /tmp/orc/2.orc
> File Version: 0.11 with FUTURE
> 2018-11-08 18:34:25,852 INFO [main] impl.ReaderImpl
(ReaderImpl.java:rows(567)) - Reading ORC rows from /tmp/orc/2.orc with
{include: null, offset: 0, length: 9223372036854775807, includeAcidColumns:
true}
> 2018-11-08 18:34:25,868 INFO [main] impl.RecordReaderImpl
(RecordReaderImpl.java:<init>(184)) - Reader schema not provided -- using file
schema
struct<Column_0:string,Column_1:string,Column_2:string,Column_3:string,Column_4:string>
> Rows: 215138
> Compression: ZSTD
> Compression size: 1048576
> Type:
struct<Column_0:string,Column_1:string,Column_2:string,Column_3:string,Column_4:string>
>
> Stripe Statistics:
> 2018-11-08 18:34:26,747 INFO [main] impl.OrcCodecPool
(OrcCodecPool.java:getCodec(56)) - Got brand-new codec ZSTD
> Stripe 1:
> Column 0: count: 178024 hasNull: false
> Column 1: count: 178024 hasNull: false
> Column 2: count: 29945 hasNull: true
> Column 3: count: 178024 hasNull: false
> Column 4: count: 178024 hasNull: false
> Column 5: count: 163643 hasNull: true
> Stripe 2:
> .... // remaining part has been omitted
@omalley Can you take a look again? We expect this can be shipped in 1.6
release. Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Enable zstd decompression in ORC Java reader
> --------------------------------------------
>
> Key: ORC-363
> URL: https://issues.apache.org/jira/browse/ORC-363
> Project: ORC
> Issue Type: Bug
> Reporter: Xiening Dai
> Assignee: Xiening Dai
> Priority: Major
>
> Update to aircompress lib 0.11 and enable zstd decompression.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)