Quanlong Huang created IMPALA-13818:
---------------------------------------

             Summary: Wasting space due to duplidated TColumnType between 
THdfsPartitions
                 Key: IMPALA-13818
                 URL: https://issues.apache.org/jira/browse/IMPALA-13818
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang
         Attachments: TDescriptorTable.png, THdfsPartition_mem_footprint-1.png

THdfsPartition is used in at least two places:
 * In the legacy catalog mode, catalogd propagates metadata to coordinators.
 * In TQueryExecRequest, it's used in the tableDescriptors sent to the backend 
as part of query plans and fragments.

It has a list of expressions to represent the partition values:
{code:java}
struct THdfsPartition {
  // These are Literal expressions
  7: list<Exprs.TExpr> partitionKeyExprs
  ...
}{code}
E.g. here is a partition (year=2009, month=1) of the functional.alltypes table:
{noformat}
THdfsPartition {
          07: partitionKeyExprs (list) = list<struct>[2] {
            [0] = TExpr {
              01: nodes (list) = list<struct>[1] {
                [0] = TExprNode {
                  01: node_type (i32) = 2,
                  02: type (struct) = TColumnType {
                    01: types (list) = list<struct>[1] {
                      [0] = TTypeNode {
                        01: type (i32) = 0,
                        02: scalar_type (struct) = TScalarType {
                          01: type (i32) = 5,
                        },
                      },
                    },
                  },
                  03: num_children (i32) = 0,
                  04: is_constant (bool) = true,
                  11: int_literal (struct) = TIntLiteral {
                    01: value (i64) = 2009,
                  },
                  23: is_codegen_disabled (bool) = false,
                },
              },
            },
            [1] = TExpr {
              01: nodes (list) = list<struct>[1] {
                [0] = TExprNode {
                  01: node_type (i32) = 2,
                  02: type (struct) = TColumnType {
                    01: types (list) = list<struct>[1] {
                      [0] = TTypeNode {
                        01: type (i32) = 0,
                        02: scalar_type (struct) = TScalarType {
                          01: type (i32) = 5,
                        },
                      },
                    },
                  },
                  03: num_children (i32) = 0,
                  04: is_constant (bool) = true,
                  11: int_literal (struct) = TIntLiteral {
                    01: value (i64) = 1,
                  },
                  23: is_codegen_disabled (bool) = false,
                },
              },
            },
          },
          09: file_desc (list) = list<struct>[1] {
            [0] = THdfsFileDesc {
              01: file_desc_data (string) = 
"\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0e\x00 
\x00\x1c\x00\x10\x00\x00\x00\b\x00\x04\x00\x0e\x00\x00\x00\x1c\x00\x00\x00\xce\x80\x17\xf3\x93\x01\x00\x00\xd1O\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\f\x00\x00\x00\x01\x00\x00\x00
 
\x00\x00\x00\n\x00\x00\x00090101.txt\x00\x00\f\x00\x14\x00\x00\x00\f\x00\b\x00\x04\x00\f\x00\x00\x00\x10\x00\x00\x00\x18\x00\x00\x00\xd1O\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x01\x00\x02\x00\x00\x00\x03\x00\x00\x00\x02\x00\x01\x00\x00\x00\x00\x00",
            },
          },
          10: location (struct) = THdfsPartitionLocation {
            01: prefix_index (i32) = 0,
            02: suffix (string) = "year=2009/month=1",
          },
          11: access_level (i32) = 1,
          12: stats (struct) = TTableStats {
            01: num_rows (i64) = 310,
          },
          13: is_marked_cached (bool) = false,
          14: id (i64) = 1,
          15: hms_parameters (map) = map<string,string>[8] {
            "STATS_GENERATED" -> "TASK",
            "impala.events.catalogServiceId" -> 
"3378db64f1864d2a:afdb738fe029883c",
            "impala.events.catalogVersion" -> "12277",
            "numFiles" -> "1",
            "numFilesErasureCoded" -> "0",
            "numRows" -> "310",
            "totalSize" -> "20433",
            "transient_lastDdlTime" -> "1734950224",
          },
          16: num_blocks (i64) = 1,
          17: total_file_size_bytes (i64) = 20433,
          19: has_incremental_stats (bool) = false,
          25: partition_name (string) = "year=2009/month=1",
          26: prev_id (i64) = -1,
          27: hdfs_storage_descriptor (struct) = THdfsStorageDescriptor {
            01: lineDelim (byte) = 0x0a,
            02: fieldDelim (byte) = 0x2c,
            03: collectionDelim (byte) = 0x2c,
            04: mapKeyDelim (byte) = 0x2c,
            05: escapeChar (byte) = 0x5c,
            06: quoteChar (byte) = 0x2c,
            07: fileFormat (i32) = 0,
            08: blockSize (i32) = 0,
          },
        }{noformat}
The partitionKeyExprs actually duplicates the partition column types, which is 
a waste in memory or network transmission. Here is an example from a heap dump:

!THdfsPartition_mem_footprint.png|width=655,height=447!

The query reads a single table of 6M partitions and 6M files. It hits OOM of 
exceeding the 2GB JVM byte array limit in serializing the DescriptorTable:
{noformat}
java.lang.Thread @ 0x7fdf6d220768 : Thread-12
  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at java.util.Arrays.copyOf([BI)[B (Arrays.java:3236)
  at java.io.ByteArrayOutputStream.toByteArray()[B 
(ByteArrayOutputStream.java:191)
  at org.apache.thrift.TSerializer.serialize(Lorg/apache/thrift/TBase;)[B 
(TSerializer.java:85)
  at 
org.apache.impala.common.JniUtil.serializeToThrift(Lorg/apache/thrift/TBase;)[B 
(JniUtil.java:109)
  at 
org.apache.impala.analysis.DescriptorTable.toSerializedThrift()Lorg/apache/impala/thrift/TDescriptorTableSerialized;
 (DescriptorTable.java:260)
  at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Lorg/apache/impala/service/Frontend$PlanCtx;Lorg/apache/impala/analysis/AnalysisContext$AnalysisResult;Lorg/apache/impala/util/EventSequence;)Lorg/apache/impala/thrift/TQueryExecRequest;
 (Frontend.java:3138)
  at 
org.apache.impala.service.Frontend.doCreateExecRequest(Lorg/apache/impala/service/Frontend$PlanCtx;Ljava/util/List;Lorg/apache/impala/util/EventSequence;)Lorg/apache/impala/thrift/TExecRequest;
 (Frontend.java:2893)
  at 
org.apache.impala.service.Frontend.getTExecRequest(Lorg/apache/impala/service/Frontend$PlanCtx;Lorg/apache/impala/util/EventSequence;)Lorg/apache/impala/thrift/TExecRequest;
 (Frontend.java:2410)
  at 
org.apache.impala.service.Frontend.createExecRequest(Lorg/apache/impala/service/Frontend$PlanCtx;)Lorg/apache/impala/thrift/TExecRequest;
 (Frontend.java:2036)
  at org.apache.impala.service.JniFrontend.createExecRequest([B)[B 
(JniFrontend.java:175){noformat}
The TDescriptorTable objects occupies 9.67GB memory. The duplicated partition 
column types wasted arround 6.7GB (6M * 2 * 600 bytes) memory.

!TDescriptorTable.png|width=960,height=605!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to