[ 
https://issues.apache.org/jira/browse/BEAM-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Catrisse updated BEAM-10464:
---------------------------------
    Description: 
Hi! I just got the following error perfoming a HBaseIO.read() from scan. 
{code:java}
Caused by: 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException:
 Protocol message was too large.  May be malicious.  Use 
CodedInputStream.setSizeLimit() to increase the size limit. at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131)
 at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50) 
at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34) 
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539)
 at 
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73)
 ... 61 more
{code}
It seems I'm scanning a family column bigger than 64MB, but HBaseIO doesn't 
provide any workaround to change the current sizeLimit of the protobuf decoder. 
How should we manage Big Data Datasets stored in HBase?

  was:
Hi! I just got the following error perfoming a HBaseIO.read() from scan. 
{code:java}
Caused by: 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException:
 Protocol message was too large.  May be malicious.  Use 
CodedInputStream.setSizeLimit() to increase the size limit. at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131)
 at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50) 
at 
org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34) 
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539)
 at 
org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73)
 ... 61 more
{code}
Actually there isn't an easy way to change the current sizeLimit from the 
protobuf decoder. How should we manage Big Data Datasets stored in HBase?


> [ HBaseIO ] - Protocol message was too large.  May be malicious.
> ----------------------------------------------------------------
>
>                 Key: BEAM-10464
>                 URL: https://issues.apache.org/jira/browse/BEAM-10464
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-community
>            Reporter: Marc Catrisse
>            Assignee: Aizhamal Nurmamat kyzy
>            Priority: P2
>
> Hi! I just got the following error perfoming a HBaseIO.read() from scan. 
> {code:java}
> Caused by: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException:
>  Protocol message was too large.  May be malicious.  Use 
> CodedInputStream.setSizeLimit() to increase the size limit. at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
>  at 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131)
>  at 
> org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50)
>  at 
> org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34)
>  at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539)
>  at 
> org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73)
>  ... 61 more
> {code}
> It seems I'm scanning a family column bigger than 64MB, but HBaseIO doesn't 
> provide any workaround to change the current sizeLimit of the protobuf 
> decoder. How should we manage Big Data Datasets stored in HBase?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to