rahul3 commented on pull request #18: URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-900704222
Hello, Thank you for reviewing this thoroughly. To answer your questions (I'm sorry for this being lengthy but I don't know how else to convey this): > I don't think that we have to install python stuff from maven command. I guess it is not standard way. I think it should be scripts - install.sh(cmd), maybe with uninstall.sh(cmd) **If the Ctxserver is considered to be an integral part of NLPCraft**, I would disagree with this for the following reasons: - Verification of every release will become more complex. This is what is currently happening where `mvn clean package verify` (or its variants) are being used as a factor to determine if the release is good without checking the Python part. It will be much cleaner and more accurate to validate a release if there is a single way to build this that tests all the areas of the project and their dependencies. The setup scripts running in Python with standard libraries (given we require Python > 3) and making it OS independent helps, in my opinion. - In my opinion, implementation of Maven builds are project dependent. In cases where a project has dependencies on multiple languages (and their dependencies). I think this way (pull request 18) is a good way to do it, considering that our project falls in this area. I think it will be more beneficial if we build profiles per operating system. In my personal experience, I have seen this implementation used in a production system. - One of the issues with `.sh` and `.cmd` scripts is that they are OS dependent. Sure, we can build profiles and then implement the script based on the OS. That is also a solution but a different one. > Do we need conda ? (I am not familiar with python tools, is it necessary for python?) I ask this question because we require too many tools for nlpcraft installation. Managing Python and its dependencies is not as streamlined as Java is (in my opinion). Regarding Python, most users today, download Anaconda or Miniconda and install Python that way. [Conda](https://docs.conda.io/en/latest/) is a package management and environment management system for Python. It comes with Anaconda and Miniconda and is extremely common in the Python world. Case example (for why we need a virtual environment for Python): Suppose your system python version is 3.5. You use dependency X with version 1.1. Now if you want to install NLPCraft and NLPCraft needs dependency X with version 1.3. Installing NLPCraft will uninstall 1.1 and you will have an issue with anything else that has used 1.1 (assuming something that you are using, changed between 1.1. and 1.3). Also, certain dependencies support certain Python versions at times. For example, there was a case when Tensorflow supported Python 3.6 and Python 3.7+ was released, this is one such example. @skhdl Note: I'm just explaining in case someone else who is not as familiar with the project and dependency management reads this. I'm well aware that you are very familiar with dependency management and I don't mean to be condescending :) The production like way to do this (without taking Docker as a factor), in my opinion is, to have a virtual environment so that the system Python is not affected. In this case, you are left with largely two options: - [virtualenv](https://docs.python.org/3/library/venv.html) - [conda](https://docs.conda.io/) The issue with virtualenv is that it uses only the system version of Python which the user has. For example, if you have 3.5, your virtual environment will use Python 3.5 only. Conda lets you manage the version of Python as well. This is primarily the reason I chose it. In this case, for our project, we eliminate technical debt by supporting one specific version of Python for this project. We will not have to care if 3.9, 4.0 etc. gets released, unless our dependencies are affected. We also give the developer the option to set the Python version and environment (in the way I currently designed it). In my opinion, a conda environment is a good option in production as well. All the developer has to do is either create a `environment.yml` file to recreate the conda environment or just make the entire folder a zip file and install it that way. I work with the assumption that, in production, I need to assume that there is no internet connection available after packaging. This is the case in many companies (especially Financial ones). My goal was to take our project as close to a real world production implementation as possible so there is minimal work left for the developer using our project. > If we need conda I guess we have to require user to configure CONDA_HOME env variable, but shouldn't ask him to update scripts to set path to conda. I agree. I could work on this. > Where is bert models etc installation place? Same location. I did not change this. <project_root>/nlpcraft/src/main/python/ctxword/ Eventually, we should also put this in `.nlpcraft-python` > When the models should be downloaded? > - during installation (I think so) or I think so too. This pull request is to first establish a virtual environment for Python and get everything setup there. We can deal with the model download location once that is complete. This is my opinion. > I guess we need to add some progressbar for such long processes as downloading models (It is not clear - is the process going and hanging) Yes, you are right. I can work on that after we establish the python environment separate from the user system python environment. It works fine on Ubuntu 20.04 for me. I tried it on MacOS and you are correct, it hangs. I will work on fixing this. > When downloading process is killed on some intermediate phase (for example, some files downloaded, some aren't, some broken etc) - how should it be resolved? Is it supported to fix such situations? Can you give me an example? I believe you might be talking about FastText. I noticed this too recently, and I am fixing it. Thank you for pointing it out. > I guess server should be started and accept HTTP connections after all models downloading process finished. I agree. > When I deleted models from <project_root>/nlpcraft/src/main/python/ctxword/, start-server script printed that downloading in progress, but new models files weren't appear. The process hangs and I don't see any errors. Yes, I mentioned above that I found this issue in MacOS. I'll fix it. > Other minor issues > org.apache.nlpcraft.server.nlp.core.spacy.NCSpaCyNerEnricher.Config - should be fixed Will look into it and fix it. > Why some logs seem duplicated? There was a previous issue which I fixed, where I mistakenly added a second log handler in the logging module. I thought I resolved this. Does it happen even when you pull with the most recent updates? Note: I also found some issues on Windows that I am working on fixing. I'm not sure if I should keep the pull request open and fix these issues or close it and start a new one. Please feel free to advise me on this. Thank you for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
