umehrot2 commented on code in PR #6237:
URL: https://github.com/apache/hudi/pull/6237#discussion_r932745731
##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java:
##########
@@ -145,8 +145,11 @@ public static Path convertPathWithScheme(Path oldPath,
String newScheme) {
URI oldURI = oldPath.toUri();
URI newURI;
try {
- newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(),
oldURI.getPort(), oldURI.getPath(),
- oldURI.getQuery(), oldURI.getFragment());
+ newURI = new URI(newScheme,
+ oldURI.getAuthority(),
+ oldURI.getPath(),
+ oldURI.getQuery(),
+ oldURI.getFragment());
Review Comment:
Yes. It will stay fully compatible. The issue is introduced because of the
use of **getHost()** API. This works fine for HDFS like URIs that have a host
and a port. Also it works well for most bucket naming conventions that match
java URI standards. But for certain patters as explained in the Jira the
**getHost()** API breaks because of the way it tries to parse the hostname.
Instead this constructor works better because it uses **getAuthority** which
correctly extracts bucket name in case of S3. I do not see a need for
specifically using the **getHotst/getPort** API, when we have **getAuthority**
that can essentially do the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]