sbp commented on issue #628:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/628#issuecomment-3867226138

   Some revisions won't introduce archives, and only `create_and_manage` can 
detect this. Therefore it will have to decide for itself whether to move files 
to `quarantined` or to `unfinished`. I thought about considering the staging 
directory as the `quarantined` directory, so that we wouldn't need any new 
directories, but then the status of the files wouldn't be clear from the 
filesystem alone.
   
   We should not create a `Revision` object until the files are moved from 
`quarantined` to `unfinished`. Since this can only be done by a task that 
`create_and_manage` creates, that means that when the `create_and_manage` 
destination is `quarantined`, it cannot create a `Revision`. It should always 
set permissions, no matter what, but then it should hand over to the 
quarantined files validation task, which would only create the `Revision` 
object when the validation passes, in an atomic database transaction block that 
also includes moving the files from `quarantined` into `unfinished`.
   
   Always creating a new `Revision` as late as possible has the advantage that 
we don't use a revision number before we're sure that the files will be kept. 
But it does incur the difficulty that the user would have no indication that 
the staging and quarantined file validation periods are ongoing. We could add a 
new database object for this, or we could name the paths more carefully to 
allow discovery from the filesystem. We could, for example, name the staging 
directory:
   
   ```
   temporary/revising/PROJECT/VERSION/ASF_UID/PRIOR_REVISION_NUMBER/UNIXTIME/
   ```
   
   And the quarantined directory similarly:
   
   ```
   temporary/quarantined/PROJECT/VERSION/ASF_UID/PRIOR_REVISION_NUMBER/UNIXTIME/
   ```
   
   These paths are quite verbose, but if 
`temporary/{revising,quarantined}/PROJECT/VERSION/ASF_UID` exists, then we know 
that either revising or quarantine checks are taking place. We only have to 
check those two deterministic locations.
   
   (If we put the `PRIOR_REVISION_NUMBER` before the `ASF_UID` then we would 
have to check every existing revision number. We need `PRIOR_REVISION_NUMBER` 
and `UNIXTIME` to ensure that the upload is unique. We can either disallow more 
than one upload per second, use a random token instead of the unixtime, or use 
both. The `PRIOR_REVISION_NUMBER` would have to be a sentinel on first upload, 
e.g. `_____`. An alternative to this entire naming scheme is that the task 
object in the database for the quarantined file validation could be used as the 
signal that those checks are taking place, but since no such object exists for 
the revising period, that would be inconsistent.)
   
   We can set `creating.quarantined = True` so that callers will know whether 
or not quarantining is happening and act accordingly, without having to perform 
any storage check. This could be used to set a flash message to inform the user 
that their files are not available yet, even to them, because they're being 
checked in quarantine. We would still need to perform a check for the path 
within the rest of the page, however, because otherwise the quarantine status 
wouldn't be evident in the user interface. We should probably give similar 
messages when any `revising` directories are found. Note that `creating.new` 
will also be `None` in the `creating.quarantined = True` case, so perhaps we 
can rely solely on that. Some callers of `create_and_manage` rely on the new 
revision being available, e.g. the SSH function uses it to show the path to the 
new revision. All such code will have to be adapted.
   
   We would also still need to do some of the fast validation in 
`create_and_manage`. We could still detect symlinks there, for example, because 
that can be done when iterating through all of the path information which needs 
to be done anyway, e.g. to detect new uploaded archives.
   
   Quarantine validation should only be for characteristics that we judge to be 
_dangerous_, not merely _wrong_. Empty files, for example, are wrong but very 
unlikely to be dangerous. A `.txt` file, on the other hand, which is an 
executable binary on a widely used OS is suspicious enough that the entire 
upload should be rejected. There is, however, an ambiguous area in between 
these extremes. A ZIP file which has legitimate package paths which, however, 
all use a leading `/` is far more likely to be a mistake than an attack. Even 
so, since we cannot differentiate, we should consider such cases to be 
dangerous. Indeed, even if the result was a mistake during the build process, 
the result is still actually dangerous.
   
   When validation fails, we should permanently record the reason, and the 
metadata of the files that were uploaded, but we should delete the entire 
corresponding quarantined directory. Part of the validation process is to 
extract archives, which should be done in `cache`. We would keep the results of 
those extractions, upon successful validation, available for subsequent tasks 
and interface actions, but upon failure these results should also be deleted.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to