BenjaminSchubert commented on code in PR #37:
URL:
https://github.com/apache/buildstream-plugins/pull/37#discussion_r1032921635
##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
try:
mirror_file = self._get_mirror_file()
with tarfile.open(mirror_file) as tar:
- tar.extractall(path=directory)
+
+ def is_within_directory(directory, target):
+ abs_directory = os.path.abspath(directory)
+ abs_target = os.path.abspath(target)
+
+ prefix = os.path.commonprefix([abs_directory, abs_target])
+
+ return prefix == abs_directory
+
+ def safe_extract(tar, path=".", members=None, *,
numeric_owner=False):
+
+ for member in tar.getmembers():
+ member_path = os.path.join(path, member.name)
+ if not is_within_directory(path, member_path):
+ raise Exception("Attempted Path Traversal in Tar
File")
Review Comment:
I think this should be a `SourceError` if I remember our contracts with
sources? Additionally, mentioning which file is violating what would be nice, I
don't think a user would be able to debug easily otherwise
##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
try:
mirror_file = self._get_mirror_file()
with tarfile.open(mirror_file) as tar:
- tar.extractall(path=directory)
+
+ def is_within_directory(directory, target):
+ abs_directory = os.path.abspath(directory)
+ abs_target = os.path.abspath(target)
+
+ prefix = os.path.commonprefix([abs_directory, abs_target])
+
+ return prefix == abs_directory
+
+ def safe_extract(tar, path=".", members=None, *,
numeric_owner=False):
+
+ for member in tar.getmembers():
+ member_path = os.path.join(path, member.name)
+ if not is_within_directory(path, member_path):
+ raise Exception("Attempted Path Traversal in Tar
File")
+
+ tar.extractall(path, members, numeric_owner=numeric_owner)
Review Comment:
My understanding of the `tar` is that it contains no global header, and thus
doing a `tar.getmembers()` then a `tar.extracall()` actually ends up reading
the file twice, which is wasteful. If we want to sanitize I would rather
extract each entry one by one after we've sanitized it's destination
##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
try:
mirror_file = self._get_mirror_file()
with tarfile.open(mirror_file) as tar:
- tar.extractall(path=directory)
+
+ def is_within_directory(directory, target):
+ abs_directory = os.path.abspath(directory)
+ abs_target = os.path.abspath(target)
+
+ prefix = os.path.commonprefix([abs_directory, abs_target])
+
+ return prefix == abs_directory
+
+ def safe_extract(tar, path=".", members=None, *,
numeric_owner=False):
+
+ for member in tar.getmembers():
+ member_path = os.path.join(path, member.name)
+ if not is_within_directory(path, member_path):
Review Comment:
I believe we could simplify the logic with `pathlib` here:
```suggestion
def safe_extract(tar, path=".", members=None, *,
numeric_owner=False):
path = Path(path).resolve()
for member in tar.getmembers():
member_path = path.joinpath(member.name).resolve()
if path not in member_path.parents:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]