Nevermind, please ignore the previous message. I think it is caused by
previous installation (0.4.0), I reset K8s cluster and now everything can
be installed, trying to follow other steps now.
1) Why by default Submarine sever login has maria_dev as default user name?
2) There's no doc about using Submarine UI, we need to add one, I will file
a PR later (TODO)
3) Trying to create notebook, the notebook name initially I gave is
"nb_123", but I got the error, "K8s submitter: parse Job object failed by
Unprocessable Entity, please try again"
If we have limitation of how naming of notebook should be, we should add it
to the UI (like only character or numbers are supported).
>From the doc it mentioned: "Name of the notebook server. It should be
unique and include no spaces." We need to update both of doc and UI.
4) When I choose the environment for notebook, I saw both my-submarine-env
and notebook-env. What are the differences between the two?
- I found only notebook-env works, "my-submarine-env" failed to start. We
should remove the my-submarine-env from the default helm installation.
5) Also, I found even if notebook is not fully start and running, the UI
indicate it is created:
[image: image.png]
Clicking notebook name will show you an error page, we should improve this
part.
6) After waiting for ~5 mins, the notebook started, but I still cannot
access the notebook UI, clicking the link on the Submarine UI tells me:
HTTP ERROR 404
Problem accessing /notebook/default/notebook1111/. Reason:
Not Found
And logs for the notebook pod tells me;
kubectl logs notebook1111-0
Conda current version is currentVersion=4.8.3;. Moving forward with
env creation and activation.
[I 17:45:50.056 NotebookApp] Writing notebook server cookie secret to
/home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 17:45:52.108 NotebookApp] All authentication is disabled. Anyone
who can connect to this server will be able to run code.
[I 17:45:52.144 NotebookApp] Serving notebooks from local directory:
/home/jovyan
[I 17:45:52.145 NotebookApp] Jupyter Notebook 6.1.3 is running at:
[I 17:45:52.146 NotebookApp]
http://notebook1111-0:8888/notebook/default/notebook1111/
[I 17:45:52.147 NotebookApp] Use Control-C to stop this server and
shut down all kernels (twice to skip confirmation).
Is it bind to a wrong port?
On Mon, Nov 2, 2020 at 9:18 AM Wangda Tan <[email protected]> wrote:
> Hi Kevin,
>
> Thank you so much for running this release.
>
> Trying to follow the helm install stage, but notebook controller is failed
> to start.
>
> I downloaded RC1 source code, and follow the guidance:
> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/helm.md
>
> The pods in my system:
>
> NAMESPACE NAME READY
> STATUS RESTARTS AGE
> default notebook-controller-deployment-58797bdd75-9fstx 0/1
> CrashLoopBackOff 7 12m
> default pytorch-operator-75fd845678-dpc52 1/1
> Running 0 12m
> default submarine-database-54776644c6-x2zrm 1/1
> Running 0 12m
> default submarine-server-5d846f7b4f-m2fw8 1/1
> Running 0 12m
> default submarine-traefik-d55c689b5-cq6db 1/1
> Running 0 12m
> default tf-job-operator-598686fd84-2fwt9 1/1
> Running 0 12m
>
> And notebook-controller pod has the following information: (kubectl
> describe pods notebook-controller-deployment-58797bdd75-9fstx)
>
> Events:
> Type Reason Age From
> Message
> ---- ------ ---- ----
> -------
> Normal Scheduled 12m default-scheduler
> Successfully assigned
> default/notebook-controller-deployment-58797bdd75-9fstx to docker-desktop
> Normal Pulling 12m kubelet, docker-desktop
> Pulling image "apache/submarine:notebook-controller-v1.1.0-g253890cb"
> Normal Pulled 12m kubelet, docker-desktop
> Successfully pulled image
> "apache/submarine:notebook-controller-v1.1.0-g253890cb"
> Normal Pulled 11m (x4 over 12m) kubelet, docker-desktop
> Container image "apache/submarine:notebook-controller-v1.1.0-g253890cb"
> already present on machine
> Normal Created 11m (x5 over 12m) kubelet, docker-desktop
> Created container manager
> Normal Started 11m (x5 over 12m) kubelet, docker-desktop
> Started container manager
> Warning BackOff 2m45s (x51 over 12m) kubelet, docker-desktop
> Back-off restarting failed container
>
> Logs: (kubectl log notebook-controller-deployment-58797bdd75-9fstx)
>
> 2020-11-02T17:11:53.159Z ERROR setup unable to create controller
> {"controller": "Notebook", "error": "no matches for kind \"Notebook\" in
> version \"kubeflow.org/v1beta1\ <http://kubeflow.org/v1beta1%5C>""}
> github.com/go-logr/zapr.(*zapLogger).Error
> /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
> main.main
> /workspace/notebook-controller/main.go:76
> runtime.main
> /usr/local/go/src/runtime/proc.go:200
>
> I guess it might be version of K8s, I'm using DockerDesktop:
>
> kubectl version
> Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6",
> GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean",
> BuildDate:"2019-11-13T11:20:18Z", GoVersion:"go1.12.12", Compiler:"gc",
> Platform:"darwin/amd64"}
> Server Version: version.Info{Major:"1", Minor:"16+",
> GitVersion:"v1.16.6-beta.0",
> GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean",
> BuildDate:"2020-01-15T08:18:29Z", GoVersion:"go1.13.5", Compiler:"gc",
> Platform:"linux/amd64"}
>
> I'm not sure what I should do now? Last time when we release 0.4.0, we
> verified 1.14, 1.15, 1.16 should all work.
>
> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/README.md
>
> What is our recommendation for now?
>
> Thanks,
> Wangda
>
>
>
>
>
> On Fri, Oct 30, 2020 at 8:47 PM Wanqiang Ji <[email protected]> wrote:
>
>> +1 for this RC1. Thanks Kevin drive this release and everyone for the
>> great
>> works.
>> I've done the following tests:
>> 1. Install the helm charts against the minikube
>> 2. Install the helm charts against the kind
>> 3. Check the UI's features, create the notebook/experiment and run work
>>
>> BR,
>> Wanqiang Ji
>>
>>
>> On Wed, Oct 28, 2020 at 4:52 PM Zhankun Tang <[email protected]> wrote:
>>
>> > Thanks for the great efforts! Kevin!
>> > I've done the below testing
>> > 1. Verify the signatures (I signed Kevin's key and updated the KEYS
>> file)
>> > 2. Build from source
>> > 3. Install the helm charts against Docker desktop k8s (v1.14.8)
>> > 4. Check basic UI's experiment and notebook features
>> >
>> > And I like the built-in examples in our notebook image. It's
>> > straightforward for beginners to get familiar with our SDK.
>> > One minor suggestion is that we can add a link in parent readme
>> > <
>> >
>> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/README.md
>> > >
>> > to our notebook.md doc. But this is not a blocker, we can improve docs
>> > later since it's in GitHub for now.
>> >
>> > I'll give my *+1(binding)* to this RC1.
>> >
>> > @Wei-Chiu Chuang <[email protected]> BTW, I can download the
>> > "apache/submarine:mini-0.5.0-RC1" image on my laptop.
>> >
>> > BR,
>> > Zhankun
>> >
>> > On Mon, 26 Oct 2020 at 10:21, Wei-Chiu Chuang
>> <[email protected]
>> > >
>> > wrote:
>> >
>> > > Curious -- I see mini-0.5.0-RC1 and RC0 docker images, however, the
>> > > operator, server, database and jupyter-notebook docker images are all
>> > > tagged version 0.5.0. Was this intentional?
>> > >
>> > > I kept getting disk full error trying to pull the RC1 minisubmarine
>> > > image: docker pull apache/submarine:mini-0.5.0-RC1
>> > >
>> > > failed to register layer: Error processing tar file(exit status 1):
>> write
>> > >
>> > >
>> >
>> /home/yarn/submarine/tf2-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so:
>> > > no space left on device
>> > >
>> > > I've already run docker system prune to remove unnecessary volumes and
>> > > images, and my local disk has more than 200GB of space.
>> > >
>> > > After the release, we should update the release information in the
>> Apache
>> > > system: https://reporter.apache.org/addrelease.html?submarine (I
>> believe
>> > > the access is restricted to PMCs)
>> > >
>> > > The docker images should be regarded as convenience binary, and our
>> vote
>> > is
>> > > for the source code. So if you need to update the images, no need to
>> > > bote again.
>> > >
>> > > On Sun, Oct 25, 2020 at 12:46 PM Wei-Chiu Chuang <[email protected]>
>> > > wrote:
>> > >
>> > > > Not a blocker, but I do notice our doc has lots of "TODO" and
>> "FIXME"
>> > and
>> > > > then realized our doc is a WIP from SUBMARINE-518.
>> > > > I'd like to find a time to contribute to the docs later.
>> > > >
>> > > > On Thu, Oct 22, 2020 at 4:16 AM Kevin Su <[email protected]>
>> wrote:
>> > > >
>> > > >> Hi folks,
>> > > >>
>> > > >>
>> > > >> Thanks to everyone's help on this release. Special thanks to
>> Wangda,
>> > > >>
>> > > >> Zhankun, Xun, Wei-Chiu, Wanqiang, Ryan, Manikandan, and JohnTing!
>> > > >>
>> > > >> I've created a release candidate (RC1) for submarine 0.5.0. The
>> > > >> highlighted
>> > > >>
>> > > >> features are as follows:
>> > > >>
>> > > >> 1. Submarine Experiments: Redefined the experiment spec, sync up
>> code
>> > > from
>> > > >> Git, it could be https and ssh
>> > > >>
>> > > >> 2. Predefined experiment template: Register A experiment template
>> and
>> > > >> submit related parameter to run an experiment using Rest API
>> > > >>
>> > > >> 3. Environment profile: Users could easily manage their docker
>> image
>> > and
>> > > >> conda environment
>> > > >>
>> > > >> 4. Jupyter Notebook: Spawn a jupyter notebook using Rest API, and
>> > > execute
>> > > >> ML code on K8s, or submit an experiment to submarine server
>> > > >>
>> > > >> 5. Submarine Workbench UI: CRUD Experiment, Environment, Notebook
>> > > through
>> > > >> the UI
>> > > >>
>> > > >> The RC tag in git is here:
>> > > >> https://github.com/apache/submarine/releases/tag/release-0.5.0-RC1
>> > > >> The RC release artifacts are available at:
>> > > >> http://home.apache.org/~pingsutw/submarine-0.5.0-RC1
>> > > >>
>> > > >> The mini-submarine image is here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://hub.docker.com/layers/apache/submarine/mini-0.5.0-RC1/images/sha256-b8bc864a9a6409361de96d93e467c6458c96f6d2d85c74639a201b1c1b9af3a0?context=explore
>> > > >>
>> > > >> The server image is here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://hub.docker.com/layers/apache/submarine/server-0.5.0/images/sha256-3c197696a773cebf3409acc5ed89504e9f56240a1748b107031bf32a4ba79e40?context=explore
>> > > >>
>> > > >> The database image is here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://hub.docker.com/layers/apache/submarine/database-0.5.0/images/sha256-fcf72289e0aa46e83fc8e65c8aca79be4bba96ec9813d54568e5679925cdc94f?context=explore
>> > > >>
>> > > >> The Jupyter Notebook image is here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://hub.docker.com/layers/apache/submarine/jupyter-notebook-0.5.0/images/sha256-1e05cdd3c814063b3cac9de12bdecd70475d38d708c06794c2d3b55ef97de82a?context=explore
>> > > >>
>> > > >> The Maven staging repository is here:
>> > > >>
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachesubmarine-1015
>> > > >>
>> > > >> My public key is here:
>> > > >> https://dist.apache.org/repos/dist/release/submarine/KEYS
>> > > >>
>> > > >> *This vote will run for 7 days, ending on Oct 29, 2020, at 11:59 pm
>> > > PST.*
>> > > >>
>> > > >> For the testing, I have verified the
>> > > >>
>> > > >> 1. Build from source, Run the mnist on Hadoop
>> > > >>
>> > > >> 2. Example with mini-submarine(both local and remote mode)
>> > > >>
>> > > >> 3. Verified the experiment operations to K8s by Submarine Server
>> REST
>> > > and
>> > > >> PySubmarine.
>> > > >>
>> > > >> 4. Workbench UI (experiment, environment, notebook)
>> > > >>
>> > > >> 5. HTTP sync code in the experiment
>> > > >>
>> > > >> 6. Environment profile REST API
>> > > >>
>> > > >> 7. Notebook REST API
>> > > >>
>> > > >>
>> > > >> Please follow the document to test these features.
>> > > >> *
>> > > >>
>> > >
>> >
>> https://github.com/apache/submarine/tree/master/dev-support/mini-submarine
>> > > >> *
>> > >
>> https://github.com/apache/submarine/blob/master/docs/user-guide-home.md
>> > > >>
>> > > >> My +1 to start. Thanks!
>> > > >>
>> > > >> BR,
>> > > >> Kevin Su
>> > > >>
>> > > >
>> > >
>> >
>>
>