The solution for Python in Bigtop may not be quite the same as the process for detecting Java home.
For example, in addressing the runtime Python environment for PySpark, the most mainstream solution currently is to use Conda for management. Once the Python environment is set up with Conda, you can simply source it each time it’s used. Alternatively, you can package the Python environment with Conda and transfer it to HDFS for Spark to use. One approach could be to write a small tool to simplify the process of setting up Python with Conda. > On Dec 23, 2023, at 20:23, Masatake Iwasaki <[email protected]> wrote: > > It basically sounds good. > > We already know that package dependencies of Bigtop is not exhaustive. > While many products depend on Java, there is no package dependency on Java. > Users can choose JDK distribution based on their preferences. > Maybe we can provide utilities like bigtop-detect-javahome of bigtop-utils > as a follow-up work. > > Since the change affects all platforms, fixing issues like BIGTOP-3978 and > BIGTOP-3979 first may make the work easier in testing perspective. > > On 2023/12/18 16:04, Jialiang Cai wrote: >> My apologies, I didn't clarify earlier. I don't want to remove Python 3. >> What I mean is to remove the 'require: python' dependency in the Spark spec >> and control. This way, installing Spark won't require a Python dependency. >> If users need to use PySpark, they can manually install the corresponding >> Python version using Conda. >> >> Additionally, there are many extra installations in the Bigtop code for >> managing Python 2. As far as I know, all components now support Python 3, >> and Python 2 has been deprecated for a long time. Bigtop just hasn't done >> the Python 3 upgrade work yet. >> >> This is because it involves the Python version in Spark 3 packaging, GPDB >> Python, Ranger Python, and Phoenix Python dependencies, but these issues can >> be resolved. >> >> Ambari used to strongly depend on Python 2, but Ambari has been dropped from >> Bigtop. None of the other components have a strong dependency on Python 2. >> In Spark, PySpark can be managed separately by users, so specifying a Python >> 3 version in the packaging isn't a good choice. >> GPDB 6 officially supports Python 3. >> While Ranger doesn't necessarily require Python for installation, although >> it has some Python 2 scripts, they are used relatively sparingly. >> So, one of the goals of this discussion is to remove Python as a dependency >> for Spark installation and to facilitate the future upgrade of Python 2 to >> Python 3 in Bigtop. >> >>> On Dec 18, 2023, at 14:50, 李帅 <[email protected]> wrote: >>> >>> python3 has a lot of compatibility issues, different linux distro have >>> different python3 versions. >>> >>> Jialiang Cai <[email protected]> 于 2023年12月18日周一 09:46写道: >>> >>>> Dear Community Members, >>>> >>>> I would like to initiate a discussion regarding the removal of Python from >>>> the Spark3 installation package. Here are a few reasons for considering >>>> this change: >>>> >>>> 1.Unlike Apache Ambari, which installs components individually, Spark3's >>>> core functionality does not depend on Python3. Therefore, it may not be >>>> appropriate to make Python3 a mandatory installation dependency for Spark. >>>> Spark itself can run without Python3, and users who do not intend to use >>>> PySpark should still be able to install and use Spark without any issues. >>>> >>>> 2.The Python3 version required by PySpark is often relatively high, and >>>> many operating systems do not provide such high Python versions by default. >>>> Including PySpark's Python3 dependency in the Bigtop codebase would >>>> introduce significant complexity. It might be more suitable for users to >>>> manually install the specific Python3 version required by PySpark, perhaps >>>> using Conda or other methods. >>>> >>>> 3.Removing Python3 dependency from Spark can also benefit the overall >>>> transition of Bigtop from Python2 to Python3. Python2 has not been >>>> maintained for a considerable period, and streamlining the codebase to work >>>> with Python3 can be a step toward maintaining the project's relevance and >>>> security. >>>> >>>> I encourage everyone to share their thoughts and opinions on this matter. >>>> Your feedback is valuable as we consider the best course of action. >>>> >>>> Thank you for your participation and input. >>>> >>>> Best regards, >>>> jiaLiang
