[
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383300
]
ASF GitHub Bot logged work on BEAM-8399:
----------------------------------------
Author: ASF GitHub Bot
Created on: 07/Feb/20 00:56
Start Date: 07/Feb/20 00:56
Worklog Time Spent: 10m
Work Description: udim commented on pull request #10223: [BEAM-8399] Add
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r375601513
##########
File path: sdks/python/apache_beam/io/hadoopfilesystem.py
##########
@@ -163,19 +181,25 @@ def join(self, base_url, *paths):
Returns:
Full url after combining all the passed components.
"""
- basepath = self._parse_url(base_url)
- return _HDFS_PREFIX + self._join(basepath, *paths)
+ server, basepath = self._parse_url(base_url)
+ # TODO full_urls check and test
+ return _HDFS_PREFIX + self._join(server, basepath, *paths)
def _join(self, basepath, *paths):
return posixpath.join(basepath, *paths)
def split(self, url):
- rel_path = self._parse_url(url)
+ server, rel_path = self._parse_url(url)
+ if server is None:
+ server = ''
+ else:
+ server = '/' + server
Review comment:
`hdfs://` URLs always use `/` as separators, hence the use posixpath.join
instead of os.path.join in this module.
Can you give me an example URL with `\` that you use in Windows, and the
name of the tool or client that supports it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 383300)
Time Spent: 1.5h (was: 1h 20m)
> Python HDFS implementation should support filenames of the format
> "hdfs://namenodehost/parent/child"
> ----------------------------------------------------------------------------------------------------
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Udi Meiri
> Priority: Major
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the
> correct filename formats for HDFS based on [1] but we currently support
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both
> versions by default (based on [2] seems like HDFS does not allow colons in
> file path so this might be possible) (2) make
> "hdfs://namenodehost/parent/child" optional for now and change it to default
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system
> implementations are consistent in this regard.
>
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>
> cc: [~udim]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)