*Update:*

After restart K8s, the problem (cannot connect to notebook) is gone, I'm
not sure if there's anybody else hits the issue, it will be better if we
can add documentation to the user doc and help users to do troubleshooting.

Verified the following:

- Run example code inside the nodebook, the deepfm_example runs without any
isssue. "submarine-experiment_sdk" also runs fine, the only issue is
"submarine_client.list_experiments(status=status)" takes some time to
execute, I checked the PODs status is:

NAME                                              READY   STATUS
 RESTARTS   AGE
mnist-dist-ps-0                                   0/1     PodInitializing
0          103s
mnist-dist-worker-0                               0/1     PodInitializing
0          103s


(QUESTION) I fast click "run" button from notebook page, for all paragraphs
(exclude the last one), and the experiment show in the UI after a while
(1-2 mins), I'm not sure why it takes that long. And it is hard to
understand it from both notebook and UI.

- (NEED HELP) After that, I tried to restart the notebook, I clicked the
button "Delete" on the UI, it showed a message on the UI: "Http failure
response for http://127.0.0.1:32080/api/v1/notebook/null: 404 Not Found"

- (NEED HELP) Also, I cannot launch a new notebook session, click "+ New
Notebook" doesn't have any effect, Chrome console error showed:

"ERROR TypeError: Cannot read property 'environment' of null
    at Object.eval [as updateRenderer]
(ng:///NotebookModule/NotebookComponent.ngfactory.js:92)
    at Object.debugUpdateRenderer [as updateRenderer] (vendor.js:88356)
    at checkAndUpdateView (vendor.js:87731)"

- (QUESTION) Also, on the Notebook List UI, all Environment, Docker Image,
Resources, Status are empty for the running notebook "notebook1111", I
don't know if it is normal or not.

- (QUESTION) On the environment page, "my-submarine-env" is showed up, is
there any example uses the my-submarine-env? If it is introduced by some
previous test, I think we should remove it. (We should only ship usable
examples/configs during release).

- (NEED HELP) I tried to follow the notebook guide (
https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/notebook.md),
run the Experiment example (see "Experiment with your notebook
"), it has error message below:

---------------------------------------------------------------------------
ApiException                              Traceback (most recent call last)
<ipython-input-1-cc077e98d460> in <module>
     30
     31 # Create experiment
---> 32 experiment =
submarine_client.create_experiment(experiment_spec=experiment_spec)
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api/experiment_client.py
in create_experiment(self, experiment_spec)
     57         :return: submarine experiment
     58         """
---> 59         response =
self.experiment_api.create_experiment(experiment_spec=experiment_spec)
     60         return response.result
     61

/opt/conda/lib/python3.7/site-packages/submarine/experiment/api/experiment_api.py
in create_experiment(self, **kwargs)
     75         """
     76         kwargs['_return_http_data_only'] = True
---> 77         return self.create_experiment_with_http_info(**kwargs)  #
noqa: E501
     78
     79     def create_experiment_with_http_info(self, **kwargs):  # noqa:
E501
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api/experiment_api.py
in create_experiment_with_http_info(self, **kwargs)
    163
_preload_content=local_var_params.get('_preload_content', True),
    164
_request_timeout=local_var_params.get('_request_timeout'),
--> 165             collection_formats=collection_formats)
    166
    167     def delete_experiment(self, id, **kwargs):  # noqa: E501
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api_client.py
in call_api(self, resource_path, method, path_params, query_params,
header_params, body, post_params, files, response_type, auth_settings,
async_req, _return_http_data_only, collection_formats, _preload_content,
_request_timeout, _host)
    417                                    auth_settings,
_return_http_data_only,
    418                                    collection_formats,
_preload_content,
--> 419                                    _request_timeout, _host)
    420
    421         return self.pool.apply_async(
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api_client.py
in __call_api(self, resource_path, method, path_params, query_params,
header_params, body, post_params, files, response_type, auth_settings,
_return_http_data_only, collection_formats, _preload_content,
_request_timeout, _host)
    218         except ApiException as e:
    219             e.body = e.body.decode('utf-8') if six.PY3 else e.body
--> 220             raise e
    221
    222         content_type = response_data.getheader('content-type')
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api_client.py
in __call_api(self, resource_path, method, path_params, query_params,
header_params, body, post_params, files, response_type, auth_settings,
_return_http_data_only, collection_formats, _preload_content,
_request_timeout, _host)
    215                                          body=body,
    216
 _preload_content=_preload_content,
--> 217
 _request_timeout=_request_timeout)
    218         except ApiException as e:
    219             e.body = e.body.decode('utf-8') if six.PY3 else e.body
/opt/conda/lib/python3.7/site-packages/submarine/experiment/api_client.py
in request(self, method, url, query_params, headers, post_params, body,
_preload_content, _request_timeout)
    461
 _preload_content=_preload_content,
    462
 _request_timeout=_request_timeout,
--> 463                                          body=body)
    464         elif method == "PUT":
    465             return self.rest_client.PUT(url,
/opt/conda/lib/python3.7/site-packages/submarine/experiment/rest.py in
POST(self, url, headers, query_params, post_params, body, _preload_content,
_request_timeout)
    324                             _preload_content=_preload_content,
    325                             _request_timeout=_request_timeout,
--> 326                             body=body)
    327
    328     def PUT(self,
/opt/conda/lib/python3.7/site-packages/submarine/experiment/rest.py in
request(self, method, url, query_params, headers, body, post_params,
_preload_content, _request_timeout)
    247
    248         if not 200 <= r.status <= 299:
--> 249             raise ApiException(http_resp=r)
    250
    251         return r
ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 03 Nov 2020 00:08:40
GMT', 'Content-Type': 'application/json;charset=utf-8', 'Content-Length':
'140', 'Server': 'Jetty(9.4.18.v20190429)'})
HTTP response body:
{"status":"CONFLICT","code":409,"success":null,"message":"K8s submitter:
parse Job object failed by Conflict","result":null,"attributes":{}}

- Also, submitted PR: https://github.com/apache/submarine/pull/444 for
documentation-related improvements, please help to review, I think it is
important to get these issues fixed.
After restart K8s, the problem (cannot connect to notebook) is gone, I'm
not sure if there's anybody else hits the issue, it will be better if we
can add documentation to the user doc and help users to do troubleshooting.


On Mon, Nov 2, 2020 at 9:54 AM Wangda Tan <[email protected]> wrote:

> Nevermind, please ignore the previous message. I think it is caused by
> previous installation (0.4.0), I reset K8s cluster and now everything can
> be installed, trying to follow other steps now.
>
> 1) Why by default Submarine sever login has maria_dev as default user
> name?
>
> 2) There's no doc about using Submarine UI, we need to add one, I will
> file a PR later (TODO)
>
> 3) Trying to create notebook, the notebook name initially I gave is
> "nb_123", but I got the error, "K8s submitter: parse Job object failed by
> Unprocessable Entity, please try again"
> If we have limitation of how naming of notebook should be, we should add
> it to the UI (like only character or numbers are supported).
> From the doc it mentioned: "Name of the notebook server. It should be
> unique and include no spaces." We need to update both of doc and UI.
>
> 4) When I choose the environment for notebook, I saw both my-submarine-env
> and notebook-env.  What are the differences between the two?
> - I found only notebook-env works, "my-submarine-env" failed to start. We
> should remove the my-submarine-env from the default helm installation.
>
> 5) Also, I found even if notebook is not fully start and running, the UI
> indicate it is created:
>
> [image: image.png]
>
> Clicking notebook name will show you an error page, we should improve this
> part.
>
> 6) After waiting for ~5 mins, the notebook started, but I still cannot
> access the notebook UI, clicking the link on the Submarine UI tells me:
>
> HTTP ERROR 404
>
> Problem accessing /notebook/default/notebook1111/. Reason:
>
>     Not Found
>
>
> And logs for the notebook pod tells me;
>
>
>  kubectl logs notebook1111-0
> Conda current version is currentVersion=4.8.3;. Moving forward with env 
> creation and activation.
> [I 17:45:50.056 NotebookApp] Writing notebook server cookie secret to 
> /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
> [W 17:45:52.108 NotebookApp] All authentication is disabled.  Anyone who can 
> connect to this server will be able to run code.
> [I 17:45:52.144 NotebookApp] Serving notebooks from local directory: 
> /home/jovyan
> [I 17:45:52.145 NotebookApp] Jupyter Notebook 6.1.3 is running at:
> [I 17:45:52.146 NotebookApp] 
> http://notebook1111-0:8888/notebook/default/notebook1111/
> [I 17:45:52.147 NotebookApp] Use Control-C to stop this server and shut down 
> all kernels (twice to skip confirmation).
>
>
> Is it bind to a wrong port?
>
>
> On Mon, Nov 2, 2020 at 9:18 AM Wangda Tan <[email protected]> wrote:
>
>> Hi Kevin,
>>
>> Thank you so much for running this release.
>>
>> Trying to follow the helm install stage, but notebook controller is
>> failed to start.
>>
>> I downloaded RC1 source code, and follow the guidance:
>> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/helm.md
>>
>> The pods in my system:
>>
>> NAMESPACE     NAME                                              READY
>> STATUS             RESTARTS   AGE
>> default       notebook-controller-deployment-58797bdd75-9fstx   0/1
>> CrashLoopBackOff   7          12m
>> default       pytorch-operator-75fd845678-dpc52                 1/1
>> Running            0          12m
>> default       submarine-database-54776644c6-x2zrm               1/1
>> Running            0          12m
>> default       submarine-server-5d846f7b4f-m2fw8                 1/1
>> Running            0          12m
>> default       submarine-traefik-d55c689b5-cq6db                 1/1
>> Running            0          12m
>> default       tf-job-operator-598686fd84-2fwt9                  1/1
>> Running            0          12m
>>
>> And notebook-controller pod has the following information:  (kubectl
>> describe pods notebook-controller-deployment-58797bdd75-9fstx)
>>
>> Events:
>>   Type     Reason     Age                   From
>> Message
>>   ----     ------     ----                  ----
>> -------
>>   Normal   Scheduled  12m                   default-scheduler
>>  Successfully assigned
>> default/notebook-controller-deployment-58797bdd75-9fstx to docker-desktop
>>   Normal   Pulling    12m                   kubelet, docker-desktop
>>  Pulling image "apache/submarine:notebook-controller-v1.1.0-g253890cb"
>>   Normal   Pulled     12m                   kubelet, docker-desktop
>>  Successfully pulled image
>> "apache/submarine:notebook-controller-v1.1.0-g253890cb"
>>   Normal   Pulled     11m (x4 over 12m)     kubelet, docker-desktop
>>  Container image "apache/submarine:notebook-controller-v1.1.0-g253890cb"
>> already present on machine
>>   Normal   Created    11m (x5 over 12m)     kubelet, docker-desktop
>>  Created container manager
>>   Normal   Started    11m (x5 over 12m)     kubelet, docker-desktop
>>  Started container manager
>>   Warning  BackOff    2m45s (x51 over 12m)  kubelet, docker-desktop
>>  Back-off restarting failed container
>>
>> Logs: (kubectl log notebook-controller-deployment-58797bdd75-9fstx)
>>
>> 2020-11-02T17:11:53.159Z ERROR setup unable to create controller
>> {"controller": "Notebook", "error": "no matches for kind \"Notebook\" in
>> version \"kubeflow.org/v1beta1\ <http://kubeflow.org/v1beta1%5C>""}
>> github.com/go-logr/zapr.(*zapLogger).Error
>> /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
>> main.main
>> /workspace/notebook-controller/main.go:76
>> runtime.main
>> /usr/local/go/src/runtime/proc.go:200
>>
>> I guess it might be version of K8s, I'm using DockerDesktop:
>>
>> kubectl version
>> Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6",
>> GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean",
>> BuildDate:"2019-11-13T11:20:18Z", GoVersion:"go1.12.12", Compiler:"gc",
>> Platform:"darwin/amd64"}
>> Server Version: version.Info{Major:"1", Minor:"16+",
>> GitVersion:"v1.16.6-beta.0",
>> GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean",
>> BuildDate:"2020-01-15T08:18:29Z", GoVersion:"go1.13.5", Compiler:"gc",
>> Platform:"linux/amd64"}
>>
>> I'm not sure what I should do now? Last time when we release 0.4.0, we
>> verified 1.14, 1.15, 1.16 should all work.
>>
>>
>> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/README.md
>>
>> What is our recommendation for now?
>>
>> Thanks,
>> Wangda
>>
>>
>>
>>
>>
>> On Fri, Oct 30, 2020 at 8:47 PM Wanqiang Ji <[email protected]> wrote:
>>
>>> +1 for this RC1. Thanks Kevin drive this release and everyone for the
>>> great
>>> works.
>>> I've done the following tests:
>>> 1. Install the helm charts against the minikube
>>> 2. Install the helm charts against the kind
>>> 3. Check the UI's features, create the notebook/experiment and run work
>>>
>>> BR,
>>> Wanqiang Ji
>>>
>>>
>>> On Wed, Oct 28, 2020 at 4:52 PM Zhankun Tang <[email protected]> wrote:
>>>
>>> > Thanks for the great efforts! Kevin!
>>> > I've done the below testing
>>> > 1. Verify the signatures (I signed Kevin's key and updated the KEYS
>>> file)
>>> > 2. Build from source
>>> > 3. Install the helm charts against Docker desktop k8s (v1.14.8)
>>> > 4. Check basic UI's experiment and notebook features
>>> >
>>> > And I like the built-in examples in our notebook image. It's
>>> > straightforward for beginners to get familiar with our SDK.
>>> > One minor suggestion is that we can add a link in parent readme
>>> > <
>>> >
>>> https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/README.md
>>> > >
>>> > to our notebook.md doc. But this is not a blocker, we can improve docs
>>> > later since it's in GitHub for now.
>>> >
>>> > I'll give my *+1(binding)* to this RC1.
>>> >
>>> > @Wei-Chiu Chuang <[email protected]> BTW, I can download the
>>> > "apache/submarine:mini-0.5.0-RC1" image on my laptop.
>>> >
>>> > BR,
>>> > Zhankun
>>> >
>>> > On Mon, 26 Oct 2020 at 10:21, Wei-Chiu Chuang
>>> <[email protected]
>>> > >
>>> > wrote:
>>> >
>>> > > Curious -- I see mini-0.5.0-RC1 and RC0 docker images, however, the
>>> > > operator, server, database and jupyter-notebook docker images are all
>>> > > tagged version 0.5.0. Was this intentional?
>>> > >
>>> > > I kept getting disk full error trying to pull the RC1 minisubmarine
>>> > > image: docker pull apache/submarine:mini-0.5.0-RC1
>>> > >
>>> > > failed to register layer: Error processing tar file(exit status 1):
>>> write
>>> > >
>>> > >
>>> >
>>> /home/yarn/submarine/tf2-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so:
>>> > > no space left on device
>>> > >
>>> > > I've already run docker system prune to remove unnecessary volumes
>>> and
>>> > > images, and my local disk has more than 200GB of space.
>>> > >
>>> > > After the release, we should update the release information in the
>>> Apache
>>> > > system: https://reporter.apache.org/addrelease.html?submarine (I
>>> believe
>>> > > the access is restricted to PMCs)
>>> > >
>>> > > The docker images should be regarded as convenience binary, and our
>>> vote
>>> > is
>>> > > for the source code. So if you need to update the images, no need to
>>> > > bote again.
>>> > >
>>> > > On Sun, Oct 25, 2020 at 12:46 PM Wei-Chiu Chuang <[email protected]
>>> >
>>> > > wrote:
>>> > >
>>> > > > Not a blocker, but I do notice our doc has lots of "TODO" and
>>> "FIXME"
>>> > and
>>> > > > then realized our doc is a WIP from SUBMARINE-518.
>>> > > > I'd like to find a time to contribute to the docs later.
>>> > > >
>>> > > > On Thu, Oct 22, 2020 at 4:16 AM Kevin Su <[email protected]>
>>> wrote:
>>> > > >
>>> > > >> Hi folks,
>>> > > >>
>>> > > >>
>>> > > >> Thanks to everyone's help on this release. Special thanks to
>>> Wangda,
>>> > > >>
>>> > > >> Zhankun, Xun, Wei-Chiu, Wanqiang, Ryan, Manikandan, and JohnTing!
>>> > > >>
>>> > > >> I've created a release candidate (RC1) for submarine 0.5.0. The
>>> > > >> highlighted
>>> > > >>
>>> > > >> features are as follows:
>>> > > >>
>>> > > >> 1. Submarine Experiments: Redefined the experiment spec, sync up
>>> code
>>> > > from
>>> > > >> Git, it could be https and ssh
>>> > > >>
>>> > > >> 2. Predefined experiment template: Register A experiment template
>>> and
>>> > > >> submit related parameter to run an experiment using Rest API
>>> > > >>
>>> > > >> 3. Environment profile: Users could easily manage their docker
>>> image
>>> > and
>>> > > >> conda environment
>>> > > >>
>>> > > >> 4. Jupyter Notebook: Spawn a jupyter notebook using Rest API, and
>>> > > execute
>>> > > >> ML code on K8s, or submit an experiment to submarine server
>>> > > >>
>>> > > >> 5. Submarine Workbench UI: CRUD Experiment, Environment, Notebook
>>> > > through
>>> > > >> the UI
>>> > > >>
>>> > > >> The RC tag in git is here:
>>> > > >>
>>> https://github.com/apache/submarine/releases/tag/release-0.5.0-RC1
>>> > > >> The RC release artifacts are available at:
>>> > > >> http://home.apache.org/~pingsutw/submarine-0.5.0-RC1
>>> > > >>
>>> > > >> The mini-submarine image is here:
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://hub.docker.com/layers/apache/submarine/mini-0.5.0-RC1/images/sha256-b8bc864a9a6409361de96d93e467c6458c96f6d2d85c74639a201b1c1b9af3a0?context=explore
>>> > > >>
>>> > > >> The server image is here:
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://hub.docker.com/layers/apache/submarine/server-0.5.0/images/sha256-3c197696a773cebf3409acc5ed89504e9f56240a1748b107031bf32a4ba79e40?context=explore
>>> > > >>
>>> > > >> The database image is here:
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://hub.docker.com/layers/apache/submarine/database-0.5.0/images/sha256-fcf72289e0aa46e83fc8e65c8aca79be4bba96ec9813d54568e5679925cdc94f?context=explore
>>> > > >>
>>> > > >> The Jupyter Notebook image is here:
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://hub.docker.com/layers/apache/submarine/jupyter-notebook-0.5.0/images/sha256-1e05cdd3c814063b3cac9de12bdecd70475d38d708c06794c2d3b55ef97de82a?context=explore
>>> > > >>
>>> > > >> The Maven staging repository is here:
>>> > > >>
>>> > >
>>> >
>>> https://repository.apache.org/content/repositories/orgapachesubmarine-1015
>>> > > >>
>>> > > >> My public key is here:
>>> > > >> https://dist.apache.org/repos/dist/release/submarine/KEYS
>>> > > >>
>>> > > >> *This vote will run for 7 days, ending on Oct 29, 2020, at 11:59
>>> pm
>>> > > PST.*
>>> > > >>
>>> > > >> For the testing, I have verified the
>>> > > >>
>>> > > >> 1. Build from source, Run the mnist on Hadoop
>>> > > >>
>>> > > >> 2. Example with mini-submarine(both local and remote mode)
>>> > > >>
>>> > > >> 3. Verified the experiment operations to K8s by Submarine Server
>>> REST
>>> > > and
>>> > > >> PySubmarine.
>>> > > >>
>>> > > >> 4. Workbench UI (experiment, environment, notebook)
>>> > > >>
>>> > > >> 5. HTTP sync code in the experiment
>>> > > >>
>>> > > >> 6. Environment profile REST API
>>> > > >>
>>> > > >> 7. Notebook REST API
>>> > > >>
>>> > > >>
>>> > > >> Please follow the document to test these features.
>>> > > >> *
>>> > > >>
>>> > >
>>> >
>>> https://github.com/apache/submarine/tree/master/dev-support/mini-submarine
>>> > > >> *
>>> > >
>>> https://github.com/apache/submarine/blob/master/docs/user-guide-home.md
>>> > > >>
>>> > > >> My +1 to start. Thanks!
>>> > > >>
>>> > > >> BR,
>>> > > >> Kevin Su
>>> > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to