[
https://issues.apache.org/jira/browse/IMPALA-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577012#comment-17577012
]
ASF subversion and git services commented on IMPALA-11401:
----------------------------------------------------------
Commit ba60c5f29a3ed4164a9a08cdc816e291c4f85352 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ba60c5f29 ]
IMPALA-11401,IMPALA-10794: Add logs and thread names for catalogd RPCs
We've seen catalogd throws OutOfMemoryError when serializing large
responses (i.e. size > 2GB). However, the related table names are
missing in the logs. Admins would like to get the table names and
blacklist those tables until they are optimized (e.g. reducing
partitions).
To improve the supportability, this patch adds logs in the Catalogd RPC
code paths to log some details of the request, also add thread
annotations to improve readability of jstacks.
Tests:
- Add unit tests for short descriptions of requests.
- Manually add codes to throw OutOfMemoryError and verify the logs
shown as expected.
- Run test_concurrent_ddls.py and metadata tests. Capture jstacks and
verify the thread annotations are shown.
- Run CORE tests
Change-Id: Iac7f2eda8b95643a3d3c3bef64ea71b67b20595a
Reviewed-on: http://gerrit.cloudera.org:8080/18772
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Catalogd should log the table names causing OOM on array limit
> --------------------------------------------------------------
>
> Key: IMPALA-11401
> URL: https://issues.apache.org/jira/browse/IMPALA-11401
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Catalogd could throw OOM errors when serializing thrift objects larger than
> the 2GB byte array limit, e.g. in serializing responses of execDdl,
> getPartialCatalogObject requests. Such kind of OOM errors are table-level
> failures. They don't mean the server is running out of memory. Catalogd is
> still able to process other RPC requests.
> To improve the supportability of catalogd, it should log the table name and
> some details of the request when throwing such OOM errors. Currently the log
> contains no details:
> {noformat}
> I0617 04:00:24.341722 534809 jni-util.cc:288] java.lang.OutOfMemoryError
> at
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
> at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
> at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
> at
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:202)
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:531)
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:480)
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema.write(FieldSchema.java:418)
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1451)
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1278)
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1144)
> at
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1414)
> at
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1238)
> at
> org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:1099)
> at
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:862)
> at
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:759)
> at
> org.apache.impala.thrift.TPartialPartitionInfo.write(TPartialPartitionInfo.java:665)
> at
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:914)
> at
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:790)
> at
> org.apache.impala.thrift.TPartialTableInfo.write(TPartialTableInfo.java:688)
> at
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:977)
> at
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:857)
> at
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse.write(TGetPartialCatalogObjectResponse.java:739)
> at org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
> at
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:254)
> I0617 04:00:24.341833 534809 status.cc:129] OutOfMemoryError: null
> @ 0xc1d7f3
> @ 0x13a8679
> @ 0xc05bf3
> @ 0xbf3433
> @ 0xd133c4
> @ 0xd07d33
> @ 0xd1f3a2
> @ 0x10d214a
> @ 0x10c5702
> @ 0x144de71
> @ 0x144f2ea
> @ 0x1c9f8d1
> @ 0x7ff6aca68ea4
> @ 0x7ff6a9540b0c
> E0617 04:00:24.341909 534809 catalog-server.cc:209] OutOfMemoryError: null
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]