Marc Catrisse created BEAM-10464:
------------------------------------

             Summary: [ HBaseIO ] - Protocol message was too large.  May be 
malicious.
                 Key: BEAM-10464
                 URL: https://issues.apache.org/jira/browse/BEAM-10464
             Project: Beam
          Issue Type: Bug
          Components: beam-community
            Reporter: Marc Catrisse
            Assignee: Aizhamal Nurmamat kyzy


Hi! I just got the following error perfoming a HBaseIO.read() from scan. 
{code:java}
// Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 0.0 (TID 3, 
ip-172-31-9-212.eu-west-1.compute.internal, executor 1): 
java.lang.IllegalStateException: Error decoding bytes for coder: 
WindowedValue$FullWindowedValueCoder(HBaseResultCoder,GlobalWindow$Coder)Job 
aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
failure: Lost task 0.3 in stage 0.0 (TID 3, 
ip-172-31-9-212.eu-west-1.compute.internal, executor 1): 
java.lang.IllegalStateException: Error decoding bytes for coder: 
WindowedValue$FullWindowedValueCoder(HBaseResultCoder,GlobalWindow$Coder) at 
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:75)
 at 
org.apache.beam.runners.spark.translation.BoundedDataset.lambda$cache$6a9a5e8d$2(BoundedDataset.java:117)
 at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at 
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31) at 
org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:137)
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
 at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
 at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at 
org.apache.beam.runners.spark.translation.MultiDoFnFunction.call(MultiDoFnFunction.java:125)
 at 
org.apache.beam.runners.spark.translation.MultiDoFnFunction.call(MultiDoFnFunction.java:63)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) at 
org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
 at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) 
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException:
 Protocol message was too large.  May be malicious.  Use 
CodedInputStream.setSizeLimit() to increase the size limit. at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131)
 at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50) 
at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34) 
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539)
 at 
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73)
 ... 61 more
{code}
Actually there isn't an easy way to change the current sizeLimit from the 
protobuf decoder. How should we manage Big Data Datasets stored in HBase?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to