Marc Catrisse created BEAM-10464:
------------------------------------
Summary: [ HBaseIO ] - Protocol message was too large. May be
malicious.
Key: BEAM-10464
URL: https://issues.apache.org/jira/browse/BEAM-10464
Project: Beam
Issue Type: Bug
Components: beam-community
Reporter: Marc Catrisse
Assignee: Aizhamal Nurmamat kyzy
Hi! I just got the following error perfoming a HBaseIO.read() from scan.
{code:java}
// Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 0.0 (TID 3,
ip-172-31-9-212.eu-west-1.compute.internal, executor 1):
java.lang.IllegalStateException: Error decoding bytes for coder:
WindowedValue$FullWindowedValueCoder(HBaseResultCoder,GlobalWindow$Coder)Job
aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 0.0 (TID 3,
ip-172-31-9-212.eu-west-1.compute.internal, executor 1):
java.lang.IllegalStateException: Error decoding bytes for coder:
WindowedValue$FullWindowedValueCoder(HBaseResultCoder,GlobalWindow$Coder) at
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:75)
at
org.apache.beam.runners.spark.translation.BoundedDataset.lambda$cache$6a9a5e8d$2(BoundedDataset.java:117)
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31) at
org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:137)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at
org.apache.beam.runners.spark.translation.MultiDoFnFunction.call(MultiDoFnFunction.java:125)
at
org.apache.beam.runners.spark.translation.MultiDoFnFunction.call(MultiDoFnFunction.java:63)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) at
org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at
org.apache.spark.scheduler.Task.run(Task.scala:123) at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException:
Protocol message was too large. May be malicious. Use
CodedInputStream.setSizeLimit() to increase the size limit. at
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
at
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131)
at
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50)
at
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602)
at
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593)
at
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539)
at
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73)
... 61 more
{code}
Actually there isn't an easy way to change the current sizeLimit from the
protobuf decoder. How should we manage Big Data Datasets stored in HBase?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)