(impala) branch master updated: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

dbecker Fri, 26 Apr 2024 06:20:33 -0700

This is an automated email from the ASF dual-hosted git repository.

dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git



The following commit(s) were added to refs/heads/master by this push:
     new 457ab9831 IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY 
nested in complex types in select list
457ab9831 is described below

commit 457ab9831afc95b14ccf2a1a9397a923e3b16f8d
Author: Daniel Becker <[email protected]>
AuthorDate: Wed Apr 3 23:43:51 2024 +0200

    IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex 
types in select list
    
    Binary fields in complex types are currently not supported at all for
    regular tables (an error is returned). For Iceberg metadata tables,
    IMPALA-12899 added a temporary workaround to allow queries that contain
    these fields to succeed by NULLing them out. This change adds support
    for displaying them with base64 encoding for both regular and Iceberg
    metadata tables.
    
    Complex types are displayed in JSON format, so simply inserting the
    bytes of the binary fields is not acceptable as it would produce invalid
    JSON. Base64 is a widely used encoding that allows representing
    arbitrary binary information using only a limited set of ASCII
    characters.
    
    This change also adds support for top level binary columns in Iceberg
    metadata tables. However, these are not base64 encoded but are returned
    in raw byte format - this is consistent with how top level binary
    columns from regular (non-metadata) tables are handled.
    
    Testing:
     - added test queries in iceberg-metadata-tables.test referencing both
       nested and top level binary fields; also updated existing queries
     - moved relevant tests (queries extracting binary fields from within
       complex types) from nested-types-scanner-basic.test to a new
       binary-in-complex-type.test file and also added a query that selects
       the containing complex types; this new test file is run from
       test_scanners.py::TestBinaryInComplexType::\
         test_binary_in_complex_type
     - moved negative tests in AnalyzerTest.TestUnsupportedTypes() to
       AnalyzeStmtsTest.TestComplexTypesInSelectList() and converted them to
       positive tests (expecting success); a negative test already in
       AnalyzeStmtsTest.TestComplexTypesInSelectList() was also converted
    
    Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
    Reviewed-on: http://gerrit.cloudera.org:8080/21269
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Csaba Ringhofer <[email protected]>
---
 .../iceberg-metadata/iceberg-metadata-scanner.cc   |  15 ++++
 .../iceberg-metadata/iceberg-metadata-scanner.h    |   8 ++
 be/src/exec/iceberg-metadata/iceberg-row-reader.cc |  96 ++++++++++++++-------
 be/src/exec/iceberg-metadata/iceberg-row-reader.h  |   8 +-
 be/src/rpc/jni-thrift-util.h                       |  16 ++--
 be/src/runtime/complex-value-writer.inline.h       |  15 ++++
 be/src/util/jni-util.cc                            |  35 ++++++--
 be/src/util/jni-util.h                             |  61 +++++++++----
 .../java/org/apache/impala/analysis/Analyzer.java  |   7 --
 .../java/org/apache/impala/analysis/SlotRef.java   |   6 --
 .../apache/impala/util/IcebergMetadataScanner.java |  12 +++
 .../apache/impala/analysis/AnalyzeStmtsTest.java   |  10 ++-
 .../org/apache/impala/analysis/AnalyzerTest.java   |  15 ----
 testdata/data/README                               |  14 +++
 ...f3c1dd3-job_17125053806420_0002-1-00001.parquet | Bin 0 -> 600 bytes
 .../64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro   | Bin 0 -> 3770 bytes
 ...470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro | Bin 0 -> 2158 bytes
 .../metadata/v1.metadata.json                      |  50 +++++++++++
 .../metadata/v2.metadata.json                      |  80 +++++++++++++++++
 .../metadata/version-hint.txt                      |   1 +
 .../functional/functional_schema_template.sql      |  15 ++++
 .../datasets/functional/schema_constraints.csv     |   1 +
 .../queries/QueryTest/binary-in-complex-type.test  |  40 +++++++++
 .../queries/QueryTest/iceberg-metadata-tables.test |  42 +++++----
 .../QueryTest/nested-types-scanner-basic.test      |  34 --------
 tests/query_test/test_scanners.py                  |  15 ++++
 26 files changed, 440 insertions(+), 156 deletions(-)

diff --git a/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc 
b/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
index 514335563..b58e0b3fc 100644
--- a/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
+++ b/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
@@ -42,6 +42,7 @@ Status IcebergMetadataScanner::InitJNI() {
       "java/util/Map$Entry", &map_entry_cl_));
   RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "java/util/List", 
&list_cl_));
   RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "java/util/Map", &map_cl_));
+  RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "[B", &byte_array_cl_));
 
   // Method ids:
   RETURN_IF_ERROR(JniUtil::GetMethodID(env, iceberg_metadata_scanner_cl_,
@@ -73,6 +74,10 @@ Status IcebergMetadataScanner::InitJNI() {
       iceberg_metadata_scanner_collection_scanner_cl_,
       "GetNextCollectionItem", "()Ljava/lang/Object;",
       &iceberg_metadata_scanner_collection_scanner_get_next_collection_item_));
+  RETURN_IF_ERROR(JniUtil::GetMethodID(env, iceberg_metadata_scanner_cl_,
+      "ByteBufferToByteArray", "(Ljava/nio/ByteBuffer;)[B",
+      &iceberg_metadata_scanner_byte_buffer_to_byte_array_));
+
   RETURN_IF_ERROR(JniUtil::GetMethodID(env, map_entry_cl_, "getKey",
       "()Ljava/lang/Object;", &map_entry_get_key_));
   RETURN_IF_ERROR(JniUtil::GetMethodID(env, map_entry_cl_, "getValue",
@@ -239,6 +244,16 @@ Status 
IcebergMetadataScanner::CreateArrayOrMapScanner(JNIEnv* env,
   return Status::OK();
 }
 
+Status IcebergMetadataScanner::ConvertJavaByteBufferToByteArray(JNIEnv* env,
+    const jobject& byte_buffer, jbyteArray* result) {
+  jobject arr = env->CallObjectMethod(jmetadata_scanner_,
+      iceberg_metadata_scanner_byte_buffer_to_byte_array_, byte_buffer);
+  RETURN_ERROR_IF_EXC(env);
+  DCHECK(env->IsInstanceOf(arr, byte_array_cl_));
+  *result = static_cast<jbyteArray>(arr);
+  return Status::OK();
+}
+
 void IcebergMetadataScanner::Close(RuntimeState* state) {
   JNIEnv* env = JniUtil::GetJNIEnv();
   if (env != nullptr) {
diff --git a/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h 
b/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
index e04465083..032b8a77c 100644
--- a/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
+++ b/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
@@ -74,6 +74,11 @@ class IcebergMetadataScanner {
   Status GetNextMapKeyAndValue(JNIEnv* env, const jobject& scanner,
       jobject* key, jobject* value) WARN_UNUSED_RESULT;
 
+  /// Helper function that extracts the contents of a java.nio.ByteBuffer into 
a Java
+  /// primitive byte array. This is used with BINARY fields.
+  Status ConvertJavaByteBufferToByteArray(JNIEnv* env, const jobject& 
byte_buffer,
+      jbyteArray* result) WARN_UNUSED_RESULT;
+
   /// Removes global references.
   void Close(RuntimeState* state);
 
@@ -84,6 +89,7 @@ class IcebergMetadataScanner {
   inline static jclass list_cl_ = nullptr;
   inline static jclass map_cl_ = nullptr;
   inline static jclass map_entry_cl_ = nullptr;
+  inline static jclass byte_array_cl_ = nullptr;
 
   /// Method references created with JniUtil.
   inline static jmethodID iceberg_metadata_scanner_ctor_ = nullptr;
@@ -97,6 +103,8 @@ class IcebergMetadataScanner {
       iceberg_metadata_scanner_collection_scanner_from_map_ = nullptr;
   inline static jmethodID
       iceberg_metadata_scanner_collection_scanner_get_next_collection_item_ = 
nullptr;
+  inline static jmethodID
+      iceberg_metadata_scanner_byte_buffer_to_byte_array_ = nullptr;
 
   inline static jmethodID map_entry_get_key_ = nullptr;
   inline static jmethodID map_entry_get_value_ = nullptr;
diff --git a/be/src/exec/iceberg-metadata/iceberg-row-reader.cc 
b/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
index 4b32feb25..4d259b5a1 100644
--- a/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
+++ b/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
@@ -15,9 +15,12 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#include "exec/iceberg-metadata/iceberg-row-reader.h"
+
+#include <type_traits>
+
 #include "exec/exec-node.inline.h"
 #include "exec/iceberg-metadata/iceberg-metadata-scanner.h"
-#include "exec/iceberg-metadata/iceberg-row-reader.h"
 #include "exec/scan-node.h"
 #include "runtime/collection-value-builder.h"
 #include "runtime/runtime-state.h"
@@ -45,6 +48,8 @@ Status IcebergRowReader::InitJNI() {
   RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "java/lang/Long", 
&long_cl_));
   RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "java/lang/CharSequence",
       &char_sequence_cl_));
+  RETURN_IF_ERROR(JniUtil::GetGlobalClassRef(env, "java/nio/ByteBuffer",
+      &byte_buffer_cl_));
 
   // Method ids:
   RETURN_IF_ERROR(JniUtil::GetMethodID(env, list_cl_, "size", "()I", 
&list_size_));
@@ -106,12 +111,13 @@ Status IcebergRowReader::WriteSlot(JNIEnv* env, const 
jobject* struct_like_row,
     } case TYPE_TIMESTAMP: { // org.apache.iceberg.types.TimestampType
       RETURN_IF_ERROR(WriteTimeStampSlot(env, accessed_value, slot));
       break;
-    } case TYPE_STRING: { // java.lang.String
-      if (type.IsBinaryType()) {
-        // TODO IMPALA-12651,IMPALA-11491: Display BINARY correctly instead of 
NULLing it.
-        tuple->SetNull(slot_desc->null_indicator_offset());
-      } else {
-        RETURN_IF_ERROR(WriteStringSlot(env, accessed_value, slot, 
tuple_data_pool));
+    } case TYPE_STRING: {
+      if (type.IsBinaryType()) { // byte[]
+        RETURN_IF_ERROR(WriteStringOrBinarySlot</* IS_BINARY */ true>(
+            env, accessed_value, slot, tuple_data_pool));
+      } else { // java.lang.String
+        RETURN_IF_ERROR(WriteStringOrBinarySlot</* IS_BINARY */ false>(
+            env, accessed_value, slot, tuple_data_pool));
       }
       break;
     } case TYPE_STRUCT: { // Struct type is not used by Impala to access 
values.
@@ -129,9 +135,8 @@ Status IcebergRowReader::WriteSlot(JNIEnv* env, const 
jobject* struct_like_row,
       break;
     }
     default:
-      // Skip the unsupported type and set it to NULL
+      DCHECK(false) << "Unsupported column type: " << slot_desc->type().type;
       tuple->SetNull(slot_desc->null_indicator_offset());
-      VLOG(3) << "Skipping unsupported column type: " << 
slot_desc->type().type;
   }
   return Status::OK();
 }
@@ -194,25 +199,46 @@ Status IcebergRowReader::WriteTimeStampSlot(JNIEnv* env, 
const jobject &accessed
   return Status::OK();
 }
 
-Status IcebergRowReader::WriteStringSlot(JNIEnv* env, const jobject 
&accessed_value,
-    void* slot, MemPool* tuple_data_pool) {
+/// To obtain bytes from JNI the JniByteArrayGuard or the JniUtfCharGuard 
class is used.
+/// Then the data has to be copied to the tuple_data_pool, because the JVM 
releases the
+/// reference and reclaims the memory area.
+template <bool IS_BINARY>
+Status IcebergRowReader::WriteStringOrBinarySlot(JNIEnv* env,
+    const jobject &accessed_value, void* slot, MemPool* tuple_data_pool) {
+  using jbufferType = typename std::conditional<IS_BINARY, jbyteArray, 
jstring>::type;
+  using GuardType = typename std::conditional<
+      IS_BINARY, JniByteArrayGuard, JniUtfCharGuard>::type;
+  const jclass& jobject_subclass = IS_BINARY ? byte_buffer_cl_ : 
char_sequence_cl_;
+
   DCHECK(accessed_value != nullptr);
-  DCHECK(env->IsInstanceOf(accessed_value, char_sequence_cl_) == JNI_TRUE);
-  jstring result = static_cast<jstring>(env->CallObjectMethod(accessed_value,
-      char_sequence_to_string_));
-  RETURN_ERROR_IF_EXC(env);
-  JniUtfCharGuard str_guard;
-  RETURN_IF_ERROR(JniUtfCharGuard::create(env, result, &str_guard));
-  // Allocate memory and copy the string from the JVM to the RowBatch
-  int str_len = strlen(str_guard.get());
-  char* buffer = 
reinterpret_cast<char*>(tuple_data_pool->TryAllocateUnaligned(str_len));
+  DCHECK(env->IsInstanceOf(accessed_value, jobject_subclass) == JNI_TRUE);
+
+  jbufferType jbuffer;
+  if constexpr (IS_BINARY) {
+    RETURN_IF_ERROR(metadata_scanner_->ConvertJavaByteBufferToByteArray(
+        env, accessed_value, &jbuffer));
+  } else {
+    jbuffer = static_cast<jstring>(env->CallObjectMethod(accessed_value,
+        char_sequence_to_string_));
+    RETURN_ERROR_IF_EXC(env);
+  }
+
+  GuardType jbuffer_guard;
+  RETURN_IF_ERROR(GuardType::create(env, jbuffer, &jbuffer_guard));
+  uint32_t jbuffer_size = jbuffer_guard.get_size();
+
+  // Allocate memory and copy the bytes from the JVM to the RowBatch.
+  char* buffer = reinterpret_cast<char*>(
+      tuple_data_pool->TryAllocateUnaligned(jbuffer_size));
   if (UNLIKELY(buffer == nullptr)) {
-    string details = strings::Substitute("Failed to allocate $1 bytes for 
string.",
-        str_len);
-    return tuple_data_pool->mem_tracker()->MemLimitExceeded(nullptr, details, 
str_len);
+    string details = strings::Substitute("Failed to allocate $0 bytes for $1.",
+        jbuffer_size, IS_BINARY ? "binary" : "string");
+    return tuple_data_pool->mem_tracker()->MemLimitExceeded(
+        nullptr, details, jbuffer_size);
   }
-  memcpy(buffer, str_guard.get(), str_len);
-  reinterpret_cast<StringValue*>(slot)->Assign(buffer, str_len);
+
+  memcpy(buffer, jbuffer_guard.get(), jbuffer_size);
+  reinterpret_cast<StringValue*>(slot)->Assign(buffer, jbuffer_size);
   return Status::OK();
 }
 
@@ -339,19 +365,23 @@ Status IcebergRowReader::WriteMapKeyAndValue(JNIEnv* env, 
const jobject& map_sca
 
 jclass IcebergRowReader::JavaClassFromImpalaType(const ColumnType type) {
   switch (type.type) {
-    case TYPE_BOOLEAN: {     // java.lang.Boolean
+    case TYPE_BOOLEAN: {         // java.lang.Boolean
       return boolean_cl_;
     } case TYPE_DATE:
-      case TYPE_INT: {       // java.lang.Integer
+      case TYPE_INT: {           // java.lang.Integer
       return integer_cl_;
-    } case TYPE_BIGINT:      // java.lang.Long
-      case TYPE_TIMESTAMP: { // org.apache.iceberg.types.TimestampType
+    } case TYPE_BIGINT:          // java.lang.Long
+      case TYPE_TIMESTAMP: {     // org.apache.iceberg.types.TimestampType
       return long_cl_;
-    } case TYPE_STRING: {    // java.lang.String
-      return char_sequence_cl_;
-    } case TYPE_ARRAY: {     // java.util.List
+    } case TYPE_STRING: {
+      if (type.IsBinaryType()) { // java.nio.ByteBuffer
+        return byte_buffer_cl_;
+      } else {                   // java.lang.CharSequence
+        return char_sequence_cl_;
+      }
+    } case TYPE_ARRAY: {         // java.util.List
       return list_cl_;
-    } case TYPE_MAP: {       // java.util.Map
+    } case TYPE_MAP: {           // java.util.Map
       return map_cl_;
     }
     default:
diff --git a/be/src/exec/iceberg-metadata/iceberg-row-reader.h 
b/be/src/exec/iceberg-metadata/iceberg-row-reader.h
index f9f665499..34f21d773 100644
--- a/be/src/exec/iceberg-metadata/iceberg-row-reader.h
+++ b/be/src/exec/iceberg-metadata/iceberg-row-reader.h
@@ -58,6 +58,7 @@ class IcebergRowReader {
   inline static jclass integer_cl_ = nullptr;
   inline static jclass long_cl_ = nullptr;
   inline static jclass char_sequence_cl_ = nullptr;
+  inline static jclass byte_buffer_cl_ = nullptr;
 
   /// Method references created with JniUtil.
   inline static jmethodID list_size_ = nullptr;
@@ -67,7 +68,6 @@ class IcebergRowReader {
   inline static jmethodID long_value_ = nullptr;
   inline static jmethodID char_sequence_to_string_ = nullptr;
 
-
   /// The scan node that started this row reader.
   ScanNode* scan_node_;
 
@@ -94,10 +94,8 @@ class IcebergRowReader {
   /// Iceberg TimeStamp is parsed into TimestampValue.
   Status WriteTimeStampSlot(JNIEnv* env, const jobject &accessed_value, void* 
slot)
       WARN_UNUSED_RESULT;
-  /// To obtain a character sequence from JNI the JniUtfCharGuard class is 
used. Then the
-  /// data has to be copied to the tuple_data_pool, because the JVM releases 
the reference
-  /// and reclaims the memory area.
-  Status WriteStringSlot(JNIEnv* env, const jobject &accessed_value, void* 
slot,
+  template <bool IS_BINARY>
+  Status WriteStringOrBinarySlot(JNIEnv* env, const jobject &accessed_value, 
void* slot,
       MemPool* tuple_data_pool) WARN_UNUSED_RESULT;
 
   /// Nested types recursively call MaterializeTuple method with their child 
tuple.
diff --git a/be/src/rpc/jni-thrift-util.h b/be/src/rpc/jni-thrift-util.h
index c2e5c0bca..77b80b9ec 100644
--- a/be/src/rpc/jni-thrift-util.h
+++ b/be/src/rpc/jni-thrift-util.h
@@ -54,17 +54,11 @@ Status SerializeThriftMsg(JNIEnv* env, T* msg, jbyteArray* 
serialized_msg) {
 
 template <class T>
 Status DeserializeThriftMsg(JNIEnv* env, jbyteArray serialized_msg, T* 
deserialized_msg) {
-  jboolean is_copy = false;
-  uint32_t buf_size = env->GetArrayLength(serialized_msg);
-  jbyte* buf = env->GetByteArrayElements(serialized_msg, &is_copy);
-
-  Status status = DeserializeThriftMsg(
-          reinterpret_cast<uint8_t*>(buf), &buf_size, false, deserialized_msg);
-
-  /// Return buffer back. JNI_ABORT indicates to not copy contents back to java
-  /// side.
-  env->ReleaseByteArrayElements(serialized_msg, buf, JNI_ABORT);
-  return status;
+  JniByteArrayGuard guard;
+  RETURN_IF_ERROR(JniByteArrayGuard::create(env, serialized_msg, &guard));
+  uint32_t buf_size = guard.get_size();
+  return DeserializeThriftMsg(reinterpret_cast<const uint8_t*>(guard.get()), 
&buf_size,
+      false, deserialized_msg);
 }
 
 }
diff --git a/be/src/runtime/complex-value-writer.inline.h 
b/be/src/runtime/complex-value-writer.inline.h
index c53325e1d..e22dc6bf1 100644
--- a/be/src/runtime/complex-value-writer.inline.h
+++ b/be/src/runtime/complex-value-writer.inline.h
@@ -20,6 +20,7 @@
 #include <string>
 
 #include "runtime/raw-value.inline.h"
+#include "util/coding-util.h"
 
 namespace impala {
 
@@ -58,6 +59,20 @@ void 
ComplexValueWriter<JsonStream>::PrimitiveValueToJSON(void* value,
   RawValue::PrintValue(value, type, scale, &tmp);
   const bool should_convert_to_string = map_key && stringify_map_keys_;
   if (IsPrimitiveTypePrintedAsString(type) || should_convert_to_string) {
+    if (type.IsBinaryType()) {
+      int64_t base64_max_len;
+      bool succ = Base64EncodeBufLen(tmp.size(), &base64_max_len);
+      DCHECK(succ);
+
+      // 'base64_max_len' includes the null terminator.
+      string buf(base64_max_len - 1, '\0');
+      unsigned base64_len = 0;
+      succ = Base64Encode(tmp.c_str(), tmp.size(), base64_max_len, buf.data(),
+          &base64_len);
+      DCHECK(succ);
+
+      tmp = std::move(buf);
+    }
     writer_->String(tmp.c_str());
   } else {
     writer_->RawValue(tmp.c_str(), tmp.size(),
diff --git a/be/src/util/jni-util.cc b/be/src/util/jni-util.cc
index 5b7dc19bd..70931bab3 100644
--- a/be/src/util/jni-util.cc
+++ b/be/src/util/jni-util.cc
@@ -32,25 +32,44 @@ DEFINE_int64(jvm_deadlock_detector_interval_s, 60,
 
 namespace impala {
 
-Status JniUtfCharGuard::create(JNIEnv* env, jstring jstr, JniUtfCharGuard* 
out) {
-  DCHECK(jstr != nullptr);
+template <class T>
+Status JniBufferGuard<T>::create(JNIEnv* env, T jbuffer, JniBufferGuard<T>* 
out) {
+  DCHECK(jbuffer != nullptr);
   DCHECK(!env->ExceptionCheck());
   jboolean is_copy;
-  const char* utf_chars = env->GetStringUTFChars(jstr, &is_copy);
+  const char* buffer = nullptr;
+  uint32_t size = -1;
+  if constexpr (std::is_same_v<T, jstring>) {
+    size = env->GetStringLength(jbuffer);
+    buffer = env->GetStringUTFChars(jbuffer, &is_copy);
+  } else {
+    static_assert(std::is_same_v<T, jbyteArray>);
+    size = env->GetArrayLength(jbuffer);
+    buffer = reinterpret_cast<char*>(env->GetByteArrayElements(jbuffer, 
&is_copy));
+  }
+  DCHECK_NE(size, -1);
+
   bool exception_check = static_cast<bool>(env->ExceptionCheck());
-  if (utf_chars == nullptr || exception_check) {
+  if (buffer == nullptr || exception_check) {
     if (exception_check) env->ExceptionClear();
-    if (utf_chars != nullptr) env->ReleaseStringUTFChars(jstr, utf_chars);
-    auto fail_message = "GetStringUTFChars failed. Probable OOM on JVM side";
+    if (buffer != nullptr) Release(env, jbuffer, buffer);
+    std::string fail_message = Substitute("$0 $1",
+        std::is_same_v<T, jstring> ? "GetStringUTFChars" : 
"GetByteArrayElements",
+        "failed. Probable OOM on JVM side.");
     LOG(ERROR) << fail_message;
     return Status(fail_message);
   }
   out->env = env;
-  out->jstr = jstr;
-  out->utf_chars = utf_chars;
+  out->jbuffer = jbuffer;
+  out->size = size;
+  out->buffer = buffer;
   return Status::OK();
 }
 
+// JniBufferGuard<jstring> is instantiated implicitly because functons in the 
header use
+// it, but JniBufferGuard<jbyteArray> needs to be instantiated explicitly.
+template class JniBufferGuard<jbyteArray>;
+
 bool JniScopedArrayCritical::Create(JNIEnv* env, jbyteArray jarr,
     JniScopedArrayCritical* out) {
   DCHECK(env != nullptr);
diff --git a/be/src/util/jni-util.h b/be/src/util/jni-util.h
index 85dfc57b9..55e313a7e 100644
--- a/be/src/util/jni-util.h
+++ b/be/src/util/jni-util.h
@@ -107,32 +107,59 @@ struct JniMethodDescriptor {
   jmethodID* method_id;
 };
 
-/// Helper class for lifetime management of chars from JNI, releasing JNI 
chars when
-/// destructed
-class JniUtfCharGuard {
+/// Helper class for the lifetime management of JNI char or byte buffers. 
Releases the JNI
+/// buffer when destructed. T must be either jstring or jbyteArray. If T is a 
jstring, the
+/// string in the buffer is in utf-8 format.
+///
+/// See also JniScopedArrayCritical, which can also be used for byte buffers 
but its usage
+/// is more restricted, see 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical
+template <class T>
+class JniBufferGuard {
+  static_assert(std::is_same_v<T, jstring> || std::is_same_v<T, jbyteArray>);
  public:
-  /// Construct a JniUtfCharGuards holding nothing
-  JniUtfCharGuard() : utf_chars(nullptr) {}
+  /// Construct a JniBufferGuard holding nothing.
+  JniBufferGuard() : env(nullptr), jbuffer{}, size(0), buffer(nullptr)  {}
 
-  /// Release the held char sequence if there is one.
-  ~JniUtfCharGuard() {
-    if (utf_chars != nullptr) env->ReleaseStringUTFChars(jstr, utf_chars);
+  /// Release the held JNI buffer if there is one.
+  ~JniBufferGuard() {
+    Release(env, jbuffer, buffer);
   }
 
-  /// Try to get chars from jstr. If error is returned, utf_chars and get() 
remain
-  /// to be nullptr, otherwise they point to a valid char sequence. The char 
sequence
-  /// lives as long as this guard. jstr should not be null.
-  static Status create(JNIEnv* env, jstring jstr, JniUtfCharGuard* out);
+  /// Try to get chars/bytes from jbuffer. If an error is returned, buffer and 
get()
+  /// remain 'nullptr's, otherwise they point to a valid char/byte array. The 
char/byte
+  /// array lives as long as this guard. jbuffer should not be null.
+  static Status create(JNIEnv* env, T jbuffer, JniBufferGuard* out);
+
+  /// Get the size of the buffer.
+  uint32_t get_size() { return size; }
 
-  /// Get the char sequence. Returns nullptr if the guard does hold a char 
sequence.
-  const char* get() { return utf_chars; }
+  /// Get the buffer. Returns nullptr if the guard does hold a buffer.
+  const char* get() { return buffer; }
  private:
   JNIEnv* env;
-  jstring jstr;
-  const char* utf_chars;
-  DISALLOW_COPY_AND_ASSIGN(JniUtfCharGuard);
+  T jbuffer;
+  uint32_t size;
+  const char* buffer;
+  DISALLOW_COPY_AND_ASSIGN(JniBufferGuard);
+
+  static void Release(JNIEnv* env, T jbuffer, const char* buffer) {
+    if (buffer != nullptr) {
+      if constexpr (std::is_same_v<T, jstring>) {
+        env->ReleaseStringUTFChars(jbuffer, buffer);
+      } else {
+        static_assert(std::is_same_v<T, jbyteArray>);
+        /// Return buffer back. JNI_ABORT indicates to not copy contents back 
to java
+        /// side.
+        env->ReleaseByteArrayElements(jbuffer,
+            const_cast<jbyte*>(reinterpret_cast<const jbyte*>(buffer)), 
JNI_ABORT);
+      }
+    }
+  }
 };
 
+using JniUtfCharGuard = JniBufferGuard<jstring>;
+using JniByteArrayGuard = JniBufferGuard<jbyteArray>;
+
 class JniScopedArrayCritical {
  public:
   /// Construct a JniScopedArrayCritical holding nothing.
diff --git a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java 
b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
index 08689d2c7..d4109c4f5 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
@@ -1801,13 +1801,6 @@ public class Analyzer {
     boolean isResolved = resolvedPath.resolve();
     Preconditions.checkState(isResolved);
 
-    if (resolvedPath.destType().isBinary() &&
-        !collTblRef.getResolvedPath().comesFromIcebergMetadataTable()) {
-      // We allow BINARY fields in collections from Iceberg metadata tables 
but NULL them
-      // out.
-      throw new AnalysisException(
-          "Binary type inside collection types is not supported 
(IMPALA-11491).");
-    }
     registerSlotRef(resolvedPath, false);
   }
 
diff --git a/fe/src/main/java/org/apache/impala/analysis/SlotRef.java 
b/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
index 905b080ba..3208b5451 100644
--- a/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
@@ -214,12 +214,6 @@ public class SlotRef extends Expr {
         throw new AnalysisException("Unsupported type '"
             + fieldType.toSql() + "' in '" + toSql() + "'.");
       }
-      if (fieldType.isBinary() && 
!desc_.getPath().comesFromIcebergMetadataTable()) {
-        // We allow BINARY fields in collections from Iceberg metadata tables 
but NULL
-        // them out.
-        throw new AnalysisException("Struct containing a BINARY type is not " +
-            "allowed in the select list (IMPALA-11491).");
-      }
 
       if (fieldType.isStructType()) {
         Preconditions.checkState(child instanceof SlotRef);
diff --git 
a/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java 
b/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
index b2669976d..9ff415104 100644
--- a/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
+++ b/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
@@ -20,6 +20,7 @@ package org.apache.impala.util;
 import com.cloudera.cloud.storage.relocated.protobuf.Struct;
 import com.google.common.base.Preconditions;
 
+import java.nio.ByteBuffer;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
@@ -131,6 +132,17 @@ public class IcebergMetadataScanner {
     return structLike.get(pos, javaClass);
   }
 
+
+  /**
+   * Extracts the contents of a ByteBuffer into a byte array.
+   */
+  public byte[] ByteBufferToByteArray(ByteBuffer buffer) {
+    int length = buffer.remaining();
+    byte[] res = new byte[length];
+    buffer.get(res);
+    return res;
+  }
+
   /**
    * Wrapper around an array or a map that is the result of a metadata table 
scan.
    * It is used to avoid iterating over a list through JNI.
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java 
b/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
index 6da0a5b45..6037d45d8 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
@@ -1061,6 +1061,12 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
         "Unable to INSERT into target table (default.new_tbl) because the 
column " +
             "'tiny_struct' has a complex type 'STRUCT<b:BOOLEAN>' and Impala 
doesn't " +
             "support inserting into tables containing complex type columns");
+    // Binary in complex types is also supported.
+    AnalyzesOk("select binary_item_col from 
functional_parquet.binary_in_complex_types");
+    AnalyzesOk(
+        "select binary_member_col from 
functional_parquet.binary_in_complex_types");
+    AnalyzesOk("select binary_key_col from 
functional_parquet.binary_in_complex_types");
+    AnalyzesOk("select binary_value_col from 
functional_parquet.binary_in_complex_types");
 
     //Make complex types available in star queries
     ctx.getQueryOptions().setExpand_complex_types(true);
@@ -1085,9 +1091,7 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
     // Allow also structs in collections and vice versa.
     AnalyzesOk("select * from functional_parquet.allcomplextypes", ctx);
     AnalyzesOk("select * from functional_orc_def.complextypestbl", ctx);
-
-    AnalysisError("select * from functional_parquet.binary_in_complex_types", 
ctx,
-        "Binary type inside collection types is not supported 
(IMPALA-11491).");
+    AnalyzesOk("select * from functional_parquet.binary_in_complex_types", 
ctx);
   }
 
   @Test
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 
b/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
index 0882bfb64..aa761a239 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
@@ -294,21 +294,6 @@ public class AnalyzerTest extends FrontendTestBase {
         "Failed to load metadata for table: 
'functional.unsupported_binary_partition'");
     // Try with hbase
     AnalyzesOk("describe functional_hbase.allcomplextypes");
-    // Returning complex types with BINARY in select list is not yet 
implemented
-    // (IMPALA-11491). Note that this is also problematic in Hive (HIVE-26454).
-    AnalysisError(
-        "select binary_item_col from 
functional_parquet.binary_in_complex_types",
-        "Binary type inside collection types is not supported 
(IMPALA-11491).");
-    AnalysisError(
-        "select binary_member_col from 
functional_parquet.binary_in_complex_types",
-        "Struct containing a BINARY type is not allowed in the select list " +
-        "(IMPALA-11491).");
-    AnalysisError(
-        "select binary_key_col from 
functional_parquet.binary_in_complex_types",
-        "Binary type inside collection types is not supported 
(IMPALA-11491).");
-    AnalysisError(
-        "select binary_value_col from 
functional_parquet.binary_in_complex_types",
-        "Binary type inside collection types is not supported 
(IMPALA-11491).");
 
     for (ScalarType t: Type.getUnsupportedTypes()) {
       // Create/Alter table.
diff --git a/testdata/data/README b/testdata/data/README
index 2e1970cf5..61eedb10e 100644
--- a/testdata/data/README
+++ b/testdata/data/README
@@ -984,6 +984,20 @@ If a snapshot has multiple manifest files, then you need 
to change manually the
 *_json files and transform it back using avro tools, the last step of 
avro_iceberg_convert.sh, or
 use testdata/bin/rewrite-iceberg-metadata.py
 
+iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata:
+Created by the following steps:
+ - saved the HDFS directory of 'iceberg_v2_no_deletes' to local
+   ${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice
+ - converted the avro manifest file to json
+ - manually replaced the 'null' value for "key_metadata" with "{"bytes" :
+   "binary_key_metadata"}"
+ - converted the modified json file back to avro.
+ - adjusted the length of manifest file in the avro snapshot file
+
+The commands for converting the avro file to json and back are listed under
+'iceberg_v2_no_deletes' in the script avro_iceberg_convert.sh. Adjusting the 
length is
+described after the script.
+
 iceberg_v2_partitioned_position_deletes:
 iceberg_v2_partitioned_position_deletes_orc:
 Created similarly to iceberg_v2_no_deletes.
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/00000-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-00001.parquet
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/00000-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-00001.parquet
new file mode 100644
index 000000000..de903bb54
Binary files /dev/null and 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/data/00000-0-data-danielbecker_20240408174043_c3737eaf-db30-4b88-aafb-f23c0f3c1dd3-job_17125053806420_0002-1-00001.parquet
 differ
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
new file mode 100644
index 000000000..78b7deb8a
Binary files /dev/null and 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/64da0e56-efa3-4025-bef1-1047fdd9a2b0-m0.avro
 differ
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
new file mode 100644
index 000000000..b09517922
Binary files /dev/null and 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro
 differ
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
new file mode 100644
index 000000000..6ba427491
--- /dev/null
+++ 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v1.metadata.json
@@ -0,0 +1,50 @@
+{
+  "format-version" : 2,
+  "table-uuid" : "8dc6c024-06c2-4812-981f-8df7f0f07973",
+  "location" : 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata",
+  "last-sequence-number" : 0,
+  "last-updated-ms" : 1712590842243,
+  "last-column-id" : 2,
+  "current-schema-id" : 0,
+  "schemas" : [ {
+    "type" : "struct",
+    "schema-id" : 0,
+    "fields" : [ {
+      "id" : 1,
+      "name" : "i",
+      "required" : false,
+      "type" : "int"
+    }, {
+      "id" : 2,
+      "name" : "s",
+      "required" : false,
+      "type" : "string"
+    } ]
+  } ],
+  "default-spec-id" : 0,
+  "partition-specs" : [ {
+    "spec-id" : 0,
+    "fields" : [ ]
+  } ],
+  "last-partition-id" : 999,
+  "default-sort-order-id" : 0,
+  "sort-orders" : [ {
+    "order-id" : 0,
+    "fields" : [ ]
+  } ],
+  "properties" : {
+    "engine.hive.enabled" : "true",
+    "write.merge.mode" : "merge-on-read",
+    "write.delete.mode" : "merge-on-read",
+    "bucketing_version" : "2",
+    "write.update.mode" : "merge-on-read",
+    "serialization.format" : "1",
+    "storage_handler" : "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler"
+  },
+  "current-snapshot-id" : -1,
+  "refs" : { },
+  "snapshots" : [ ],
+  "statistics" : [ ],
+  "snapshot-log" : [ ],
+  "metadata-log" : [ ]
+}
\ No newline at end of file
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
new file mode 100644
index 000000000..7d18018df
--- /dev/null
+++ 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/v2.metadata.json
@@ -0,0 +1,80 @@
+{
+  "format-version" : 2,
+  "table-uuid" : "8dc6c024-06c2-4812-981f-8df7f0f07973",
+  "location" : 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata",
+  "last-sequence-number" : 1,
+  "last-updated-ms" : 1712590853544,
+  "last-column-id" : 2,
+  "current-schema-id" : 0,
+  "schemas" : [ {
+    "type" : "struct",
+    "schema-id" : 0,
+    "fields" : [ {
+      "id" : 1,
+      "name" : "i",
+      "required" : false,
+      "type" : "int"
+    }, {
+      "id" : 2,
+      "name" : "s",
+      "required" : false,
+      "type" : "string"
+    } ]
+  } ],
+  "default-spec-id" : 0,
+  "partition-specs" : [ {
+    "spec-id" : 0,
+    "fields" : [ ]
+  } ],
+  "last-partition-id" : 999,
+  "default-sort-order-id" : 0,
+  "sort-orders" : [ {
+    "order-id" : 0,
+    "fields" : [ ]
+  } ],
+  "properties" : {
+    "engine.hive.enabled" : "true",
+    "write.merge.mode" : "merge-on-read",
+    "write.delete.mode" : "merge-on-read",
+    "bucketing_version" : "2",
+    "write.update.mode" : "merge-on-read",
+    "serialization.format" : "1",
+    "storage_handler" : "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler"
+  },
+  "current-snapshot-id" : 3079551887386250470,
+  "refs" : {
+    "main" : {
+      "snapshot-id" : 3079551887386250470,
+      "type" : "branch"
+    }
+  },
+  "snapshots" : [ {
+    "sequence-number" : 1,
+    "snapshot-id" : 3079551887386250470,
+    "timestamp-ms" : 1712590853544,
+    "summary" : {
+      "operation" : "append",
+      "added-data-files" : "1",
+      "added-records" : "3",
+      "added-files-size" : "600",
+      "changed-partition-count" : "1",
+      "total-records" : "3",
+      "total-files-size" : "600",
+      "total-data-files" : "1",
+      "total-delete-files" : "0",
+      "total-position-deletes" : "0",
+      "total-equality-deletes" : "0"
+    },
+    "manifest-list" : 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/snap-3079551887386250470-1-64da0e56-efa3-4025-bef1-1047fdd9a2b0.avro",
+    "schema-id" : 0
+  } ],
+  "statistics" : [ ],
+  "snapshot-log" : [ {
+    "timestamp-ms" : 1712590853544,
+    "snapshot-id" : 3079551887386250470
+  } ],
+  "metadata-log" : [ {
+    "timestamp-ms" : 1712590842243,
+    "metadata-file" : 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/00000-b66acea3-39d8-4f9d-a326-0ce073f45944.metadata.json"
+  } ]
+}
\ No newline at end of file
diff --git 
a/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
new file mode 100644
index 000000000..0cfbf0888
--- /dev/null
+++ 
b/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata/metadata/version-hint.txt
@@ -0,0 +1 @@
+2
diff --git a/testdata/datasets/functional/functional_schema_template.sql 
b/testdata/datasets/functional/functional_schema_template.sql
index 0c5fab271..0290e6ba0 100644
--- a/testdata/datasets/functional/functional_schema_template.sql
+++ b/testdata/datasets/functional/functional_schema_template.sql
@@ -3893,6 +3893,21 @@ SELECT * FROM  
{db_name}{db_suffix}.iceberg_query_metadata;
 ---- DATASET
 functional
 ---- BASE_TABLE_NAME
+iceberg_with_key_metadata
+---- CREATE
+CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name}
+STORED AS ICEBERG
+TBLPROPERTIES('iceberg.catalog'='hadoop.catalog',
+              
'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog',
+              'iceberg.table_identifier'='ice.iceberg_with_key_metadata',
+              'format-version'='2');
+---- DEPENDENT_LOAD
+`hadoop fs -mkdir -p /test-warehouse/iceberg_test/hadoop_catalog/ice && \
+hadoop fs -put -f 
${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_with_key_metadata
 /test-warehouse/iceberg_test/hadoop_catalog/ice
+====
+---- DATASET
+functional
+---- BASE_TABLE_NAME
 iceberg_lineitem_multiblock
 ---- CREATE
 CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name}
diff --git a/testdata/datasets/functional/schema_constraints.csv 
b/testdata/datasets/functional/schema_constraints.csv
index be3c1f1e2..84046aa7b 100644
--- a/testdata/datasets/functional/schema_constraints.csv
+++ b/testdata/datasets/functional/schema_constraints.csv
@@ -104,6 +104,7 @@ table_name:iceberg_multiple_storage_locations, 
constraint:restrict_to, table_for
 table_name:iceberg_avro_format, constraint:restrict_to, 
table_format:parquet/none/none
 table_name:iceberg_mixed_file_format, constraint:restrict_to, 
table_format:parquet/none/none
 table_name:iceberg_test_metadata, constraint:restrict_to, 
table_format:parquet/none/none
+table_name:iceberg_with_key_metadata, constraint:restrict_to, 
table_format:parquet/none/none
 table_name:iceberg_lineitem_multiblock, constraint:restrict_to, 
table_format:parquet/none/none
 table_name:iceberg_lineitem_sixblocks, constraint:restrict_to, 
table_format:parquet/none/none
 table_name:iceberg_spark_compaction_with_dangling_delete, 
constraint:restrict_to, table_format:parquet/none/none
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
 
b/testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
new file mode 100644
index 000000000..349a0d659
--- /dev/null
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/binary-in-complex-type.test
@@ -0,0 +1,40 @@
+====
+---- QUERY
+# Tests specifically for BINARY type in complex types.
+select binary_member_col.b from binary_in_complex_types
+---- TYPES
+BINARY
+---- RESULTS
+'member'
+====
+---- QUERY
+select a.item from binary_in_complex_types t, t.binary_item_col a
+---- TYPES
+BINARY
+---- RESULTS
+'item1'
+'item2'
+====
+---- QUERY
+select m.key, m.value from binary_in_complex_types t, t.binary_key_col m
+---- TYPES
+BINARY,INT
+---- RESULTS
+'key1',1
+'key2',2
+====
+---- QUERY
+select m.key, m.value from binary_in_complex_types t, t.binary_value_col m
+---- TYPES
+INT,BINARY
+---- RESULTS
+1,'value1'
+2,'value2'
+====
+---- QUERY
+select binary_item_col, binary_key_col, binary_value_col, binary_member_col 
from binary_in_complex_types
+---- TYPES
+STRING,STRING,STRING,STRING
+---- RESULTS
+'["aXRlbTE=","aXRlbTI="]','{"a2V5MQ==":1,"a2V5Mg==":2}','{1:"dmFsdWUx",2:"dmFsdWUy"}','{"i":0,"b":"bWVtYmVy"}'
+====
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
index 50c3fe2ea..538cb3463 100644
--- 
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
@@ -778,37 +778,45 @@ select column_sizes, value_counts, split_offsets, 
equality_ids from functional_p
 STRING,STRING,STRING,STRING
 ====
 
+####
+# Query top-level BINARY columns;
+####
+---- QUERY
+select key_metadata from functional_parquet.iceberg_with_key_metadata.`files`;
+---- RESULTS
+'binary_key_metadata'
+---- TYPES
+BINARY
+====
+
 ####
 # Query BINARY elements of complex types;
 ####
 ---- QUERY
-select lower_bounds, upper_bounds from 
functional_parquet.iceberg_query_metadata.all_files;
+select lower_bounds, upper_bounds from 
functional_parquet.iceberg_v2_no_deletes.all_files;
 ---- RESULTS
-'{1:null}','{1:null}'
-'{1:null}','{1:null}'
-'{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}'
-'{1:null}','{1:null}'
+'{1:"AQAAAA==",2:"eA=="}','{1:"AwAAAA==",2:"eg=="}'
 ---- TYPES
 STRING,STRING
 ====
 ---- QUERY
-select data_file from functional_parquet.iceberg_query_metadata.entries;
----- RESULTS
-row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:null},"upper_bounds":{1:null},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
-row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:null},"upper_bounds":{1:null},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
-row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:null},"upper_bounds":{1:null},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
-row_regex:'{"content":1,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{2147483546:[1-9][0-9]*,2147483545:[1-9][0-9]*},"value_counts":{2147483546:1,2147483545:1},"null_value_counts":{2147483546:0,2147483545:0},"nan_value_counts":null,"lower_bounds":{2147483546:null,2147483545:null},"upper_bounds":{2147483546:null,214748354
 [...]
+# Filter out position delete files because they contain filenames that vary by 
dataload.
+select data_file from functional_parquet.iceberg_query_metadata.entries where 
data_file.content != 1;
+---- RESULTS : VERIFY_IS_SUBSET
+row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:"AwAAAA=="},"upper_bounds":{1:"AwAAAA=="},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
+row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:"AgAAAA=="},"upper_bounds":{1:"AgAAAA=="},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
+row_regex:'{"content":0,"file_path":".*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*_data.0.parq","file_format":"PARQUET","spec_id":0,"record_count":1,"file_size_in_bytes":[1-9][0-9]*,"column_sizes":{1:47},"value_counts":{1:1},"null_value_counts":{1:0},"nan_value_counts":null,"lower_bounds":{1:"AQAAAA=="},"upper_bounds":{1:"AQAAAA=="},"key_metadata":null,"split_offsets":null,"equality_ids":null,"sort_order_id":0}'
 ---- TYPES
 STRING
 ====
 ---- QUERY
-select * from functional_parquet.iceberg_v2_delete_both_eq_and_pos.all_files;
+# Filter out position delete files because they contain filenames that vary by 
dataload.
+select * from functional_parquet.iceberg_v2_delete_both_eq_and_pos.all_files 
where content != 1;
 ---- RESULTS
-1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e00000001_800513971_data.0.parq','PARQUET',0,1,1606,'{2147483546:215,2147483545:51}','{2147483546:1,2147483545:1}','{2147483546:0,2147483545:0}','NULL','{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}','NULL','NULL','NULL',NULL,'{"d":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bou
 [...]
-2,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00002.parquet','PARQUET',0,2,697,'{1:40,3:66}','{1:2,3:2}','{1:0,3:0}','{}','{1:null,3:null}','{1:null,3:null}','NULL','[4]','[1,3]',0,'{"d":{"column_size":66,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-13"},"i":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"
 [...]
-0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00001.parquet','PARQUET',0,2,885,'{1:40,2:62,3:40}','{1:2,2:2,3:2}','{1:0,2:0,3:0}','{}','{1:null,2:null,3:null}','{1:null,2:null,3:null}','NULL','[4]','NULL',0,'{"d":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-23"},"i":{"column_size":40,"value_count":2,"null_value_count":
 [...]
-0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00001.parquet','PARQUET',0,2,898,'{1:40,2:54,3:66}','{1:2,2:2,3:2}','{1:0,2:0,3:0}','{}','{1:null,2:null,3:null}','{1:null,2:null,3:null}','NULL','[4]','NULL',0,'{"d":{"column_size":66,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-13"},"i":{"column_size":40,"value_count":2,"null_value_count":
 [...]
-2,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00002.parquet','PARQUET',0,2,657,'{1:40,3:40}','{1:2,3:2}','{1:0,3:0}','{}','{1:null,3:null}','{1:null,3:null}','NULL','[4]','[1,3]',0,'{"d":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-23"},"i":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"
 [...]
+2,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00002.parquet','PARQUET',0,2,697,'{1:40,3:66}','{1:2,3:2}','{1:0,3:0}','{}','{1:"AQAAAA==",3:"+EwAAA=="}','{1:"AgAAAA==",3:"+EwAAA=="}','NULL','[4]','[1,3]',0,'{"d":{"column_size":66,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-13"},"i":{"column_size":40,"value_count":2,"null_value_count":0,
 [...]
+0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00001.parquet','PARQUET',0,2,885,'{1:40,2:62,3:40}','{1:2,2:2,3:2}','{1:0,2:0,3:0}','{}','{1:"AgAAAA==",2:"c3RyMl91cGRhdGVk",3:"+EwAAA=="}','{1:"AwAAAA==",2:"c3RyMw==",3:"Ak0AAA=="}','NULL','[4]','NULL',0,'{"d":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-23"},"i":{"column_
 [...]
+0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00001.parquet','PARQUET',0,2,898,'{1:40,2:54,3:66}','{1:2,2:2,3:2}','{1:0,2:0,3:0}','{}','{1:"AQAAAA==",2:"c3RyMQ==",3:"+EwAAA=="}','{1:"AgAAAA==",2:"c3RyMg==",3:"+EwAAA=="}','NULL','[4]','NULL',0,'{"d":{"column_size":66,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-13"},"i":{"column_size":40
 [...]
+2,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00002.parquet','PARQUET',0,2,657,'{1:40,3:40}','{1:2,3:2}','{1:0,3:0}','{}','{1:"AgAAAA==",3:"+EwAAA=="}','{1:"AwAAAA==",3:"Ak0AAA=="}','NULL','[4]','[1,3]',0,'{"d":{"column_size":40,"value_count":2,"null_value_count":0,"nan_value_count":null,"lower_bound":"2023-12-13","upper_bound":"2023-12-23"},"i":{"column_size":40,"value_count":2,"null_value_count":0,
 [...]
 ---- TYPES
 
INT,STRING,STRING,INT,BIGINT,BIGINT,STRING,STRING,STRING,STRING,STRING,STRING,BINARY,STRING,STRING,INT,STRING
 ====
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
 
b/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
index 03c002d61..266d90461 100644
--- 
a/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test
@@ -429,37 +429,3 @@ select key, value from 
pos_item_key_value_complextypestbl.int_map;
 ---- TYPES
 STRING,INT
 ====
----- QUERY
-# Tests specifically for BINARY type in complex types.
-# BINARY is currently not supported in complex types in select lists
-# due to uncertainty about formatting (IMPALA-11491).
-select binary_member_col.b from binary_in_complex_types
----- TYPES
-BINARY
----- RESULTS
-'member'
-====
----- QUERY
-select a.item from binary_in_complex_types t, t.binary_item_col a
----- TYPES
-BINARY
----- RESULTS
-'item1'
-'item2'
-====
----- QUERY
-select m.key, m.value from binary_in_complex_types t, t.binary_key_col m
----- TYPES
-BINARY,INT
----- RESULTS
-'key1',1
-'key2',2
-====
----- QUERY
-select m.key, m.value from binary_in_complex_types t, t.binary_value_col m
----- TYPES
-INT,BINARY
----- RESULTS
-1,'value1'
-2,'value2'
-====
diff --git a/tests/query_test/test_scanners.py 
b/tests/query_test/test_scanners.py
index 30602e9b0..59a179fdf 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -2002,6 +2002,21 @@ class TestBinaryType(ImpalaTestSuite):
     self.run_test_case('QueryTest/binary-type', vector)
 
 
+class TestBinaryInComplexType(ImpalaTestSuite):
+  @classmethod
+  def get_workload(cls):
+    return 'functional-query'
+
+  @classmethod
+  def add_test_dimensions(cls):
+    super(TestBinaryInComplexType, cls).add_test_dimensions()
+    cls.ImpalaTestMatrix.add_constraint(
+        lambda v: v.get_value('table_format').file_format in ['parquet', 
'orc'])
+
+  def test_binary_in_complex_type(self, vector):
+    self.run_test_case('QueryTest/binary-in-complex-type', vector)
+
+
 class TestParquetV2(ImpalaTestSuite):
   @classmethod
   def get_workload(cls):

(impala) branch master updated: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

Reply via email to