MonkeyCanCode opened a new pull request, #176: URL: https://github.com/apache/polaris/pull/176
# Description For `docker-compose-jupyter.yml`, this is the demo for how to use interact with Iceberg with Spark when using Polaris as catalog and Jupyter notebook as UI. Currently it is setting `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` through environment but only on `polaris` container and not `jupyter` container. Due to this, this will actually caused failure as that is where the writer happened via Spark. Also, the `jupyter` container is implicitly setting `AWS_REGION` to `us-west-2` within the code (and outside too via hard-coded environment`), we should keep this set in one place instead. Fix https://github.com/apache/polaris/issues/144 ## Type of change Please delete options that are not relevant. - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update # How Has This Been Tested? This had being tested locally with following steps: Unhappy path (current code): ``` # create .env file for `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` # build and start containers docker compose -f docker-compose-jupyter.yml up # get auth info from logs (as default is in-memory) docker logs xxxx | grep xxxx # update jupyter notebook for polaris's auth info as well as AWS ARN and S3 path # run through the notebook # failed on create table due to no access to the s3 bucket ``` Happy path (fixed code): ``` # create .env file for `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` # build and start containers docker compose -f docker-compose-jupyter.yml up # get auth info from logs (as default is in-memory) docker logs xxxx | grep xxxx # update jupyter notebook for polaris's auth info as well as AWS ARN and S3 path # run through the notebook # all cells completed (there are couple cells there are expected to fail due to access on role/catalog...but those are expected thus consider as completed) ``` Reasoning: This is because the AWS ARN is just a role and we still still need an user to assume that role. # Checklist: Please delete options that are not relevant. - [x] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged and published in downstream modules - [ ] If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue - [x] I have signed and submitted the [ICLA](https://github.com/polaris-catalog/polaris/blob/main/ICLA.md) and if needed, the [CCLA](https://github.com/polaris-catalog/polaris/blob/main/CCLA.md). See [Contributing](https://github.com/polaris-catalog/polaris/blob/main/CONTRIBUTING.md) for details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org