kbendick opened a new issue #2676: URL: https://github.com/apache/iceberg/issues/2676
We currently lack any kind of docker based demo environment. We have some sample notebooks, and some JMH tests, but we don't have a playground to provide to potential users to easily download and evaluate Iceberg. Having a docker demo or testing environment has become somewhat the standard for many open source projects. Especially given how many parts are involved with using Iceberg (at least one distributed computing environment, a catalog, somewhere to store data, etc), the barrier to entry can seem somewhat high for people - especially people who don't have a hive metastore at all (which is probably more common than many people think.... there's a lot of data engineers out there just working with files on S3 and maintaining them as tables themselves in whatever ad-hoc fashion). I've opened this ticket as a follow up to https://github.com/apache/iceberg/issues/1081, since that issue is very old and has not seen any progress in a long time. I have a pretty decent, basic docker-compose environment that I can push that has spark, a hive metastore, and HDFS. I know many people have expressed interest in this. Given that my environment is already usable and is pretty complete, I would like to push that to use as a starting point to get the discussion going and then people can continue to work on bringing in other frameworks, helper shell scripts (which I have some though I'll probably keep it simple to start), as well as other catalogs (I know the Nessie folks are likely very interested in integrating Nessie into the demo environment). Since my initial work on this is mostly done, I will try to push by early next week and then we can go from there! @rymurr mentioned that he and some of the Nessie folks have a pretty decent setup currently for a notebook environment that can run on Google's colab w/o extra infrastructure. My approach admittedly is more bare bones / lowest common denominator, and uses a local environment, so I will push what I have by early next week and then we can collaborate from there! cc @rdblue @rymurr @flyrain @RussellSpitzer @aokolnychyi @nastra @jasonhughes248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
