sbp commented on issue #628:
URL:
https://github.com/apache/tooling-trusted-releases/issues/628#issuecomment-3867226138
Some revisions won't introduce archives, and only `create_and_manage` can
detect this. Therefore it will have to decide for itself whether to move files
to `quarantined` or to `unfinished`. I thought about considering the staging
directory as the `quarantined` directory, so that we wouldn't need any new
directories, but then the status of the files wouldn't be clear from the
filesystem alone.
We should not create a `Revision` object until the files are moved from
`quarantined` to `unfinished`. Since this can only be done by a task that
`create_and_manage` creates, that means that when the `create_and_manage`
destination is `quarantined`, it cannot create a `Revision`. It should always
set permissions, no matter what, but then it should hand over to the
quarantined files validation task, which would only create the `Revision`
object when the validation passes, in an atomic database transaction block that
also includes moving the files from `quarantined` into `unfinished`.
Always creating a new `Revision` as late as possible has the advantage that
we don't use a revision number before we're sure that the files will be kept.
But it does incur the difficulty that the user would have no indication that
the staging and quarantined file validation periods are ongoing. We could add a
new database object for this, or we could name the paths more carefully to
allow discovery from the filesystem. We could, for example, name the staging
directory:
```
temporary/revising/PROJECT/VERSION/ASF_UID/PRIOR_REVISION_NUMBER/UNIXTIME/
```
And the quarantined directory similarly:
```
temporary/quarantined/PROJECT/VERSION/ASF_UID/PRIOR_REVISION_NUMBER/UNIXTIME/
```
These paths are quite verbose, but if
`temporary/{revising,quarantined}/PROJECT/VERSION/ASF_UID` exists, then we know
that either revising or quarantine checks are taking place. We only have to
check those two deterministic locations.
(If we put the `PRIOR_REVISION_NUMBER` before the `ASF_UID` then we would
have to check every existing revision number. We need `PRIOR_REVISION_NUMBER`
and `UNIXTIME` to ensure that the upload is unique. We can either disallow more
than one upload per second, use a random token instead of the unixtime, or use
both. The `PRIOR_REVISION_NUMBER` would have to be a sentinel on first upload,
e.g. `_____`. An alternative to this entire naming scheme is that the task
object in the database for the quarantined file validation could be used as the
signal that those checks are taking place, but since no such object exists for
the revising period, that would be inconsistent.)
We can set `creating.quarantined = True` so that callers will know whether
or not quarantining is happening and act accordingly, without having to perform
any storage check. This could be used to set a flash message to inform the user
that their files are not available yet, even to them, because they're being
checked in quarantine. We would still need to perform a check for the path
within the rest of the page, however, because otherwise the quarantine status
wouldn't be evident in the user interface. We should probably give similar
messages when any `revising` directories are found. Note that `creating.new`
will also be `None` in the `creating.quarantined = True` case, so perhaps we
can rely solely on that. Some callers of `create_and_manage` rely on the new
revision being available, e.g. the SSH function uses it to show the path to the
new revision. All such code will have to be adapted.
We would also still need to do some of the fast validation in
`create_and_manage`. We could still detect symlinks there, for example, because
that can be done when iterating through all of the path information which needs
to be done anyway, e.g. to detect new uploaded archives.
Quarantine validation should only be for characteristics that we judge to be
_dangerous_, not merely _wrong_. Empty files, for example, are wrong but very
unlikely to be dangerous. A `.txt` file, on the other hand, which is an
executable binary on a widely used OS is suspicious enough that the entire
upload should be rejected. There is, however, an ambiguous area in between
these extremes. A ZIP file which has legitimate package paths which, however,
all use a leading `/` is far more likely to be a mistake than an attack. Even
so, since we cannot differentiate, we should consider such cases to be
dangerous. Indeed, even if the result was a mistake during the build process,
the result is still actually dangerous.
When validation fails, we should permanently record the reason, and the
metadata of the files that were uploaded, but we should delete the entire
corresponding quarantined directory. Part of the validation process is to
extract archives, which should be done in `cache`. We would keep the results of
those extractions, upon successful validation, available for subsequent tasks
and interface actions, but upon failure these results should also be deleted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]