juergbi commented on code in PR #1997:
URL: https://github.com/apache/buildstream/pull/1997#discussion_r1995346164


##########
src/buildstream/source.py:
##########
@@ -262,10 +262,123 @@ def __init__(
 
 @dataclass
 class AliasSubstitution:
+    """AliasSubstitution()
+    An opaque data structure which may be passed through
+    :func:`SourceFetcher.fetch() <buildstream.source.SourceFetcher.fetch>` and 
in such cases
+    must be provided to :func:`Source.translate_url() 
<buildstream.source.Source.translate_url>`.
+    """
+
     _effective_alias: str
     _mirror: Union[SourceMirror, str]
 
 
+class SourceInfoMedium(FastEnum):
+    """
+    Indicates the meduim in which the source is obtained
+
+    *Since: 2.5*
+    """
+
+    LOCAL = "local"
+    """
+    Files stored locally in the project
+    """
+
+    ARCHIVE = "archive"

Review Comment:
   `LOCAL` vs. `ARCHIVE` seem like orthogonal attributes to me. You can have a 
local archive in your project and you can fetch a single e.g. config file from 
a remote server, at least theoretically. Maybe `LOCAL` should rather be a flag 
than part of this enum? (Or maybe even make it implicit as the corresponding 
URL will have no host component).
   
   I'm also not sure whether we should have one enum entry for each archive 
file format (e.g. `TAR` and `ZIP`) or whether a generic `ARCHIVE` makes more 
sense.



##########
src/buildstream/source.py:
##########
@@ -262,10 +262,123 @@ def __init__(
 
 @dataclass
 class AliasSubstitution:
+    """AliasSubstitution()
+    An opaque data structure which may be passed through
+    :func:`SourceFetcher.fetch() <buildstream.source.SourceFetcher.fetch>` and 
in such cases
+    must be provided to :func:`Source.translate_url() 
<buildstream.source.Source.translate_url>`.
+    """
+
     _effective_alias: str
     _mirror: Union[SourceMirror, str]
 
 
+class SourceInfoMedium(FastEnum):
+    """
+    Indicates the meduim in which the source is obtained

Review Comment:
   ```suggestion
       Indicates the medium in which the source is obtained
   ```



##########
src/buildstream/downloadablefilesource.py:
##########
@@ -270,6 +270,13 @@ def fetch(self):  # pylint: disable=arguments-differ
                 "File downloaded from {} has sha256sum '{}', not 
'{}'!".format(self.url, sha256, self.ref)
             )
 
+    def collect_source_info(self):
+        #
+        # XXX remote sources are not necessarily archives, perhaps we should
+        # allow downloadablefilesource imlementations to choose the 
SourceInfoMedium
+        #
+        return [SourceInfo(self.url, SourceInfoMedium.ARCHIVE, 
SourceVersionType.SHA256, self.ref)]

Review Comment:
   `self.url` is a fully qualified URL after alias translation. The unique key 
of a source (and thus, the cache key of an element) only covers the aliased 
URL, though. I.e., `collect_source_info()` may return different fully qualified 
URLs (including URLs of internal mirrors) for builds with the same cache key. 
I'm not sure how to solve this but it seems like a potential issue. Or am I 
misreading the code?
   
   This issue is not specific to `DownloadableFileSource`, of course.



##########
src/buildstream/source.py:
##########
@@ -262,10 +262,123 @@ def __init__(
 
 @dataclass
 class AliasSubstitution:
+    """AliasSubstitution()
+    An opaque data structure which may be passed through
+    :func:`SourceFetcher.fetch() <buildstream.source.SourceFetcher.fetch>` and 
in such cases
+    must be provided to :func:`Source.translate_url() 
<buildstream.source.Source.translate_url>`.
+    """
+
     _effective_alias: str
     _mirror: Union[SourceMirror, str]
 
 
+class SourceInfoMedium(FastEnum):
+    """
+    Indicates the meduim in which the source is obtained
+
+    *Since: 2.5*
+    """
+
+    LOCAL = "local"
+    """
+    Files stored locally in the project
+    """
+
+    ARCHIVE = "archive"
+    """
+    An archive file
+    """
+
+    GIT = "git"
+    """
+    A git repository
+    """
+
+
+class SourceVersionType(FastEnum):
+    """
+    Indicates the type of the version string
+
+    *Since: 2.5*
+    """
+
+    VERSION = "version"
+    """
+    The upstream version string, which may be semantic version
+    """
+
+    COMMIT = "commit"
+    """
+    A commit string which accurately represents a version in a source
+    code repository or VCS
+    """
+
+    SHA256 = "sha256"
+    """
+    An sha256 checksum
+    """
+
+    DIGEST = "digest"
+    """
+    A CAS digest representing the unique version of this source input
+    """
+
+
+class SourceInfo:
+    """SourceInfo()
+
+    An object representing the provenance of input reported by
+    :func:`Source.collect_source_info() 
<buildstream.source.Source.collect_source_info>`
+
+    *Since: 2.5*
+    """
+
+    def __init__(self, url: str, medium: str, version_type: str, version: str):
+        # XXX assert medium and version_type are valid values for the enums
+
+        self.url: str = url
+        """
+        The url of the source input
+        """
+
+        self.medium: str = medium
+        """
+        The :class:`.SourceInfoMedium` of the source input
+        """
+
+        self.version_type: str = version_type

Review Comment:
   A single source URL may have both an upstream version and e.g. a commit or 
hash. Do we want/need to support this? Or is the idea to handle this with 
multiple `SourceInfo` objects pointing to the same URL?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to