BenjaminSchubert commented on code in PR #37:
URL: 
https://github.com/apache/buildstream-plugins/pull/37#discussion_r1032921635


##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
         try:
             mirror_file = self._get_mirror_file()
             with tarfile.open(mirror_file) as tar:
-                tar.extractall(path=directory)
+
+                def is_within_directory(directory, target):
+                    abs_directory = os.path.abspath(directory)
+                    abs_target = os.path.abspath(target)
+
+                    prefix = os.path.commonprefix([abs_directory, abs_target])
+
+                    return prefix == abs_directory
+
+                def safe_extract(tar, path=".", members=None, *, 
numeric_owner=False):
+
+                    for member in tar.getmembers():
+                        member_path = os.path.join(path, member.name)
+                        if not is_within_directory(path, member_path):
+                            raise Exception("Attempted Path Traversal in Tar 
File")

Review Comment:
   I think this should be a `SourceError` if I remember our contracts with 
sources? Additionally, mentioning which file is violating what would be nice, I 
don't think a user would be able to debug easily otherwise



##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
         try:
             mirror_file = self._get_mirror_file()
             with tarfile.open(mirror_file) as tar:
-                tar.extractall(path=directory)
+
+                def is_within_directory(directory, target):
+                    abs_directory = os.path.abspath(directory)
+                    abs_target = os.path.abspath(target)
+
+                    prefix = os.path.commonprefix([abs_directory, abs_target])
+
+                    return prefix == abs_directory
+
+                def safe_extract(tar, path=".", members=None, *, 
numeric_owner=False):
+
+                    for member in tar.getmembers():
+                        member_path = os.path.join(path, member.name)
+                        if not is_within_directory(path, member_path):
+                            raise Exception("Attempted Path Traversal in Tar 
File")
+
+                    tar.extractall(path, members, numeric_owner=numeric_owner)

Review Comment:
   My understanding of the `tar` is that it contains no global header, and thus 
doing a `tar.getmembers()` then a `tar.extracall()` actually  ends up reading 
the file twice, which is wasteful. If we want to sanitize I would rather 
extract each entry one by one after we've sanitized it's destination



##########
src/buildstream_plugins/sources/cargo.py:
##########
@@ -148,7 +148,25 @@ def stage(self, directory):
         try:
             mirror_file = self._get_mirror_file()
             with tarfile.open(mirror_file) as tar:
-                tar.extractall(path=directory)
+
+                def is_within_directory(directory, target):
+                    abs_directory = os.path.abspath(directory)
+                    abs_target = os.path.abspath(target)
+
+                    prefix = os.path.commonprefix([abs_directory, abs_target])
+
+                    return prefix == abs_directory
+
+                def safe_extract(tar, path=".", members=None, *, 
numeric_owner=False):
+
+                    for member in tar.getmembers():
+                        member_path = os.path.join(path, member.name)
+                        if not is_within_directory(path, member_path):

Review Comment:
   I believe we could simplify the logic with `pathlib` here:
   
   ```suggestion
                   def safe_extract(tar, path=".", members=None, *, 
numeric_owner=False):
                       path = Path(path).resolve()
                       
                       for member in tar.getmembers():
                           member_path = path.joinpath(member.name).resolve()
                           if path not in member_path.parents:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to