suyashj1231 commented on issue #3842:
URL: https://github.com/apache/texera/issues/3842#issuecomment-4772478564

   @aicam I took this issue and spent some time on the `AccessDenied` bug from 
the 10/23 notes. I managed to reproduce it on a `single-node` deployment, and 
from what I can tell it's a SigV4 host mismatch, not anything about the file 
itself. The whiteboard plan (presign-url carrying a filename header, fetched by 
the browser's download manager) is the right idea, but I hit two things that 
change where the presign has to happen.
   
   One, the LakeFS S3 gateway at `lakefs:8000` ignores 
`response-content-disposition`. Presigning a GET through the gateway gives me a 
200 but no `Content-Disposition` header, so the filename never gets set. 
Presigning straight against MinIO does set it: `Content-Disposition: 
attachment; filename="Iris.csv"`, correct name. So the filename header on the 
board only sticks if we presign against MinIO, not the gateway.
   
   Two, the `AccessDenied` is a signed-host mismatch. SigV4 signs the `Host` 
header, so the URL has to be signed with the same endpoint the browser actually 
hits. Signing internal and fetching external fails:
   
   ```
   A) signed host texera-minio:9000, fetched at localhost:9000  -> 403 
SignatureDoesNotMatch
   B) signed host localhost:9000,     fetched at localhost:9000  -> 200 OK, 
Content-Disposition set
   ```
   
   That fits why it breaks on one deployment and not another. When the 
backend's S3 endpoint and the browser-facing endpoint are the same host it 
works, when they differ you get the rejection. Region (`us-west-2` vs 
`us-east-1`) made no difference in my tests.
   
   So the version that actually holds up: presign directly against MinIO with 
`response-content-disposition`, signed with the external pre-signed endpoint. 
Right now `file-service` only knows the internal `STORAGE_S3_ENDPOINT`, so it'd 
need the external one too, same idea as LakeFS's 
`BLOCKSTORE_S3_PRE_SIGNED_ENDPOINT`. I can take a shot at that if it sounds 
right to you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to