[
https://issues.apache.org/jira/browse/HUDI-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-5545:
---------------------------------
Labels: pull-request-available (was: )
> Extending support to other special characters for S3EventsMetaSelector
> ----------------------------------------------------------------------
>
> Key: HUDI-5545
> URL: https://issues.apache.org/jira/browse/HUDI-5545
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.13.0
>
>
> This fix is to cover issue as follows.
> I am working on ingestion with S3 as source by following this
> [blog|https://hudi.apache.org/blog/2021/08/23/s3-events-source/] . But 2nd
> job(S3EventsHoodieIncrSource) failing with
> {{{}HoodieException: org.apache.hudi.exception.HoodieException: Path does not
> exist{}}}. In our investigation, we have observed job failing due to encoded
> characters( these are being added by SQS) in S3 object name.
> When we deep dive in Hudi source code , we have observed Hudi decoding them
> in
> [S3EventsMetaSelector|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/S3EventsMetaSelector.java#L154]
> & at the movement only = have handled.
> FYI-
> Original S3 object :
> {{s3://<bucket>/s3_parquet_source_data/s3-test+0+0000061344.parquet}}
> Encoded S3 object:
> {{s3://<bucket>/s3_parquet_source_data/s3-test%2B0%2B0000061344.parquet}}
> Note: workflow was running successfully if file name corrected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)