yzeng1618 commented on code in PR #9270:
URL: https://github.com/apache/seatunnel/pull/9270#discussion_r2076684475


##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleTypeConverter.java:
##########
@@ -202,9 +213,17 @@ public Column convert(BasicTypeDefine typeDefine) {
                 builder.columnLength(BYTES_4GB - 1);
                 break;
             case ORACLE_BLOB:
-                builder.dataType(PrimitiveByteArrayType.INSTANCE);
-                // The maximum length of the column is 4GB-1
-                builder.columnLength(BYTES_4GB - 1);
+                if (handleBlobAsString) {
+                    builder.dataType(BasicType.STRING_TYPE);
+                    builder.columnLength(BYTES_4GB - 1);
+                    log.info("Converted BLOB to STRING_TYPE with length: {}", 
BYTES_4GB - 1);
+                } else {
+                    builder.dataType(PrimitiveByteArrayType.INSTANCE);
+                    builder.columnLength(BYTES_4GB - 1);
+                    log.info(
+                            "Converted BLOB to PrimitiveByteArrayType with 
length: {}",
+                            BYTES_4GB - 1);
+                }

Review Comment:
   Thank you for your question.The design choice to handle BLOB data as strings 
is based on real customer use cases. We've encountered numerous situations 
where customers store XML files or other textual content in Oracle BLOB fields 
rather than CLOB, particularly in legacy systems or specific application 
scenarios.
   
   When attempting to process these BLOB data directly in SQL queries using 
type conversions (like TO_CLOB), users often face byte size limitations, 
especially with large XML files. These limitations can result in data 
truncation or conversion failures.
   
   This is why we implemented the handleBlobAsString option, providing users 
with flexibility during the ETL process:
   
   1. When the data is actually textual content (like XML), it can be converted 
to STRING_TYPE
   2. When the data is genuinely binary, it maintains its original binary form
   This approach circumvents Oracle's internal type conversion limitations, 
offering more reliable processing capabilities for large textual data while 
maintaining backward compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to