viirya commented on code in PR #178:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/178#discussion_r1518635306


##########
common/src/main/java/org/apache/comet/vector/CometPlainVector.java:
##########
@@ -113,7 +113,11 @@ public UTF8String getUTF8String(int rowId) {
       byte[] result = new byte[length];
       Platform.copyMemory(
           null, valueBufferAddress + offset, result, 
Platform.BYTE_ARRAY_OFFSET, length);
-      return UTF8String.fromString(convertToUuid(result).toString());
+      if (length == 16) {
+        return UTF8String.fromString(convertToUuid(result).toString());
+      } else {
+        return UTF8String.fromBytes(result);
+      }

Review Comment:
   In TPCDS queries, this encounters non uuid (not 16 bytes) FLBA (fixed length 
byte array) to be accessed as string. Actually, I am not sure about the uuid 
mapping (FLBA -> string) statement here:
   
   > // Iceberg maps UUID to StringType.
   > // The data type here must be UUID because the only FLBA -> String mapping 
we have is UUID.
   
   So looks like this is not correct because it actually has non uuid FLBA -> 
string mapping. 🤔 
   This may lead to possible issue if the FLBA is happened to be 16 bytes but 
it is NOT uuid but normal string.
   
   But without this change, the query cannot be passed.
   
   Is there any metadata info we can get to know if it is an uuid column or not?
   
   cc @huaxingao 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to