kbendick opened a new issue #2676:
URL: https://github.com/apache/iceberg/issues/2676


   We currently lack any kind of docker based demo environment.
   
   We have some sample notebooks, and some JMH tests, but we don't have a 
playground to provide to potential users to easily download and evaluate 
Iceberg.
   
   Having a docker demo or testing environment has become somewhat the standard 
for many open source projects. Especially given how many parts are involved 
with using Iceberg (at least one distributed computing environment, a catalog, 
somewhere to store data, etc), the barrier to entry can seem somewhat high for 
people - especially people who don't have a hive metastore at all (which is 
probably more common than many people think.... there's a lot of data engineers 
out there just working with files on S3 and maintaining them as tables 
themselves in whatever ad-hoc fashion).
   
   I've opened this ticket as a follow up to 
https://github.com/apache/iceberg/issues/1081, since that issue is very old and 
has not seen any progress in a long time.
   
   I have a pretty decent, basic docker-compose environment that I can push 
that has spark, a hive metastore, and HDFS.
   
   I know many people have expressed interest in this. Given that my 
environment is already usable and is pretty complete, I would like to push that 
to use as a starting point to get the discussion going and then people can 
continue to work on bringing in other frameworks, helper shell scripts (which I 
have some though I'll probably keep it simple to start), as well as other 
catalogs (I know the Nessie folks are likely very interested in integrating 
Nessie into the demo environment).
   
   Since my initial work on this is mostly done, I will try to push by early 
next week and then we can go from there!
   
   @rymurr mentioned that he and some of the Nessie folks have a pretty decent 
setup currently for a notebook environment that can run on Google's colab w/o 
extra infrastructure.
   
   My approach admittedly is more bare bones / lowest common denominator, and 
uses a local environment, so I will push what I have by early next week and 
then we can collaborate from there!
   
   cc @rdblue @rymurr @flyrain @RussellSpitzer @aokolnychyi  @nastra 
@jasonhughes248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to