[
https://issues.apache.org/jira/browse/SPARK-19109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sydt updated SPARK-19109:
-------------------------
Attachment: InsertPic_.png
hi, I meet this problem and resolved by recompile source code of
hive-exec-1.2.1-spark2.jar of spark-2.1.0/jars
The source code website: https://github.com/JoshRosen
Second: download the patch and put into ReaderImpl.java
https://issues.apache.org/jira/secure/attachment/12750949/HIVE-11592.1.patch
then put this patch into ReaderImpl.java in Intellij IDE.
then, you can recompile and package the source code;
replace the origin jar in spark/jars
[email protected]
From: Dongjoon Hyun (JIRA)
Date: 2017-08-18 15:57
To: sydt2011
Subject: [jira] [Commented] (SPARK-19109) ORC metadata section can sometimes
exceed protobuf message size limit
[
https://issues.apache.org/jira/browse/SPARK-19109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131877#comment-16131877
]
Dongjoon Hyun commented on SPARK-19109:
---------------------------------------
Hi, [~nseggert] and [~wangchao2017].
Could you give us a way to reproduce this?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
> ORC metadata section can sometimes exceed protobuf message size limit
> ---------------------------------------------------------------------
>
> Key: SPARK-19109
> URL: https://issues.apache.org/jira/browse/SPARK-19109
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.3, 2.0.2, 2.1.0, 2.2.0
> Reporter: Nic Eggert
> Attachments: InsertPic_.png
>
>
> Basically, Spark inherits HIVE-11592 from its Hive dependency. From that
> issue:
> If there are too many small stripes and with many columns, the overhead for
> storing metadata (column stats) can exceed the default protobuf message size
> of 64MB. Reading such files will throw the following exception
> {code}
> Exception in thread "main"
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too
> large. May be malicious. Use CodedInputStream.setSizeLimit() to increase
> the size limit.
> at
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at
> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
> at
> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1331)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1281)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4887)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4803)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12925)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12872)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
> at
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13599)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13546)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
> at
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.<init>(ReaderImpl.java:468)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:314)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
> at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> This is fixed in Hive 1.3, so it should be fairly straightforward to pick up
> the patch.
> As a side note: Spark's management of its Hive fork/dependency seems
> incredibly arcane to me. Surely there's a better way than publishing to
> central from developers' personal repos.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]