yzeng1618 opened a new issue, #9268:
URL: https://github.com/apache/seatunnel/issues/9268

   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   The JDBC connector currently fails to properly preserve the original content 
when processing BLOB fields from Oracle databases. This issue is clearly 
demonstrated in the provided example:
   
   In the Oracle source table (TEST_BLOB_TABLE), we have BLOB data with 
different content types:
   
   - Row 1: Simple text "Hello, World!"
   - Row 2: XML content
   - Row 3: HTML content
   However, after synchronization to the Doris target table, all BLOB data is 
converted to Base64-encoded strings:
   
   - Row 1: "SGVsbG8sIFdvcmxkIQ=="
   - Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."
   - Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
   This transformation makes the data unusable in its original form. Users 
cannot directly work with the text, XML, or HTML content as they could in the 
source database. Instead, they would need to perform additional Base64 decoding 
steps to retrieve the original content.
   
   ### Usage Scenario
   
   This feature is essential for users who need to accurately transfer Oracle 
BLOB data to target systems while preserving the original content format. 
Specific scenarios include:
   
   1. Data Migration Projects : When migrating databases containing BLOB fields 
with text, XML, HTML, or other structured content from Oracle to systems like 
Doris, users need the original content to remain usable.
   2. Document Management Systems : Organizations storing documents (HTML, XML, 
JSON) in Oracle BLOB fields need to maintain the document structure during data 
synchronization.
   3. Application Integration : When applications rely on specific data formats 
stored in BLOB fields, the integrity of these formats must be preserved during 
data transfer.
   4. Data Analysis : Analysts working with structured data stored in BLOB 
fields need the original format for proper analysis rather than encoded strings.
   Without this feature, users must implement additional post-processing steps 
to decode and reconstruct the original data, significantly complicating their 
data pipelines.
   
   ### Related issues
   
   no
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to