Today, Spark's CLI is spread across 20+ scripts in bin/ and sbin/. This
fragmented interface makes discovery more difficult for users and maintenance
more difficult for contributors.
I would like to propose adding a new command line tool named spark that unifies
Spark's command line interface. It would be a pure Python script that sticks to
the standard library and acts simply (at least at first) as a dispatcher. It
would offer users a single entry point with a clear --help that makes it easy
to discover and understand what's available.
The commands I would like to implement are as follows:
spark
connect
start
status
stop
history
start
status
stop
master
start
status
stop
pipelines
python
scala
sql
submit
thrift
start
status
stop
worker[s]
decommission
run
start
status
stop
These commands all dispatch to existing scripts. The unified Python CLI would
only implement sub-commands itself when the underlying script does not already
offer its own handling of CLI arguments. For example, pipelines accepts various
commands, but those are handled already by its own CLI so the unified CLI
doesn't need to implement them. connect, on the other hand, implements start
and stop as scripts, so we need to implement those in our unified CLI and route
the arguments to the appropriate script.
There are several things still to hash out, like:
Packaging of this unified CLI so it's placed on the PATH correctly.
The exact naming and structure of the commands.
Whether we eventually want to fold any Bash scripts directly into this new
Python CLI.
But before we get into that, I first want to see if this idea is attractive to
the community.
Shall I flesh this out into a ticket and publish a working prototype, or does
the idea have some critical flaw?
Nick