This is an automated email from the ASF dual-hosted git repository.
dheres pushed a change to branch add_collect_statistics
in repository https://gitbox.apache.org/repos/asf/arrow-ballista.git
from 4b13512a Add config to collect statistics
add 85dc0836 updated readme to contain correct versions of dependencies.
(#580)
add 03e351f8 Fix benchmark image link (#596)
add 7f285339 Add support for Azure (#599)
add 45034da7 Remove outdated script and use evergreen version of rust
(#597)
add f59179a0 feat: update script such that ballista-cli image is built as
well (#601)
add 18e2cc88 Fix Cargo.toml format issue (#616)
add cbad55d7 Refactor executor main (#614)
add b1713967 Refactor scheduler main (#615)
add dfc168c6 Python: add method to get explain output as a string (#593)
add a621a3f8 Update contributor guide (#617)
add 93cf14f7 Cluster state refactor part 1 (#560)
add 75b4f845 replace master with main (#621)
add a5f306e5 implement new release process (#623)
add 70032e1f add docs on who can release (#632)
add 0b6b28f8 Upgrade to DataFusion 16 (again) (#636)
add b0d8cc7e Update datafusion dependency to the latest version (#612)
add 1b0be751 Upgrade to DataFusion 17 (#639)
add 1a6bc262 check in benchmark image (#647)
add 918c3448 Remove `python` dir & python-related workflows (#654)
add 0b8d99c3 Handle job resubmission (#586)
add 8f8154f4 Add executor self-registration mechanism in the heartbeat
service (#649)
add e7f87741 Cluster state refactor Part 2 (#658)
add a3ac8575 Upgrade to DataFusion 18.0.0-rc1 (#664)
add a9fd64d8 Minor refactor to reduce duplicate code (#659)
add cdfb3a09 move test_util to ballista-examples package (#661)
add cbd6e3e1 Upgrade to DataFusion 18 (#668)
add 5b180a12 Enable physical plan round-trip tests (#666)
add eba125e1 Prep 0.11 (#682)
add 9a6ec0a8 [minor] remove todo (#683)
add 22029020 Add executor terminating status for graceful shutdown (#667)
add cd0d822f Allow `BallistaContext::read_*` methods to read multiple
paths. (#679)
add 449ae487 Update scheduler.md (#657)
add 35019e64 Mark `SchedulerState` as pub (#688)
add b61cfbf5 Update graphviz-rust requirement from 0.5.0 to 0.6.1 (#651)
add a95e621a Upgrade DataFusion to 19.0.0 (#691)
add 7bca0ca1 update release notes (#692)
add 565088f4 Make task launcher pub (#695)
add fe6e2f59 Make task_manager pub (#698)
add 85031c42 Add ExecutionEngine abstraction (#687)
add da77fad2 Allow accessing s3 locations in client mode (#700)
add a7dfcb39 deployment/docker-compose.md incorrect remote ref (#699)
add 9206bdb2 Fix for error message during testing (#707)
add 8007a414 Upgrade datafusion to 20.0.0 & sqlparser to to 0.32.0 (#711)
add 620b5cf3 Update README.md (#729)
add f98a3785 Update link to proto file in dev docs (#713)
add a08347a3 Fix `show tables` fails (#715)
add 0b496e59 Remove redundant fields in ExecutorManager (#728)
add 837c8406 Fix parameter '--config-backend' to '--cluster-backend' (#720)
add 32f97930 Upgrade DataFusion to 21.0.0 (#727)
add 48c4c2d9 [minor] remove useless bracelet (#739)
add 4e4842ce Only decode plan in `LaunchMultiTaskParams` once (#743)
add a9ecd3a0 Upgrade DataFusion to 22.0.0 (#740)
add 47718d86 [feature] support shuffle read with retry when facing IO
error. (#738)
add 805e346a [log] Print long running task status. (#750)
add ae1d3de4 Upgrade DataFusion to 23.0.0 (#755)
add fe755134 Fix plan metrics length and stage metrics length not match
(#764)
add fce15a5a added match arms to create ClusterStorageConfig (#766)
add 30232e00 [Improve] refactor the offer_reservation avoid wait result
(#760)
add 9b4a1219 [fea] Avoid multithreaded write lock conflicts in event queue
(#754)
add d2f33820 Upgrade DataFusion to 24.0.0 (#769)
add 9b6d9e68 Refine create_datafusion_context() (#778)
add 371a6970 Remove output_partitioning for task definition (#776)
add d56d4b9e Upgrade DataFusion to 25.0.0 (#779)
add d5a55005 Disable the ansi feature of tracing-subscriber (#784)
add 10e021a9 Add config grpc_server_max_decoding_message_size to make the
maximum size of a decoded message at the grpc server side configurable (#782)
add b70b372c Fix nodejs issues in Docker build (#731)
add b06dcd86 Upgrade node version to fix build in `main` (#794)
add 544af413 Remove redundant mod session_registry (#792)
add 2f0f27c5 Make last_seen_ts_threshold for getting alive executor at the
scheduler side larger than the heartbeat time interval (#786)
add 4e07e02e Remove the prometheus-metrics from the default feature (#788)
add 01bcf7c4 Refine the ExecuteQuery grpc interface (#790)
add 6a69e834 Merge
No new revisions were added by this update.
Summary of changes:
.dockerignore | 1 +
.github/dependabot.yml | 2 +-
.github/workflows/comment_bot.yml | 6 +-
.github/workflows/dev.yml | 4 +-
.github/workflows/python_build.yml | 128 -
.github/workflows/python_test.yaml | 73 -
.github/workflows/rust.yml | 79 +-
.gitignore | 2 +-
CONTRIBUTING.md | 212 +-
Cargo.toml | 40 +-
README.md | 2 +-
ballista-cli/Cargo.toml | 18 +-
ballista-cli/src/command.rs | 6 +-
ballista-cli/src/exec.rs | 12 +-
ballista-cli/src/main.rs | 10 +-
ballista/CHANGELOG.md | 82 +
ballista/client/Cargo.toml | 15 +-
ballista/client/README.md | 4 +-
ballista/client/src/context.rs | 140 +-
ballista/core/Cargo.toml | 20 +-
ballista/core/build.rs | 4 +-
ballista/core/proto/ballista.proto | 485 +-
ballista/core/proto/datafusion.proto | 849 +-
ballista/core/src/client.rs | 110 +-
ballista/core/src/config.rs | 13 +-
ballista/core/src/error.rs | 48 +-
ballista/core/src/event_loop.rs | 7 +-
.../core/src/execution_plans/distributed_query.rs | 59 +-
.../core/src/execution_plans/shuffle_reader.rs | 27 +-
.../core/src/execution_plans/shuffle_writer.rs | 22 +-
.../core/src/execution_plans/unresolved_shuffle.rs | 4 -
ballista/core/src/lib.rs | 2 +-
ballista/core/src/plugin/plugin_manager.rs | 2 +-
ballista/core/src/plugin/udf.rs | 6 +-
ballista/core/src/serde/generated/ballista.rs | 1584 ++--
ballista/core/src/serde/mod.rs | 753 +-
.../core/src/serde/physical_plan/from_proto.rs | 418 -
ballista/core/src/serde/physical_plan/mod.rs | 1707 ----
ballista/core/src/serde/physical_plan/to_proto.rs | 468 --
ballista/core/src/serde/scheduler/from_proto.rs | 95 +-
ballista/core/src/serde/scheduler/mod.rs | 4 +-
ballista/core/src/serde/scheduler/to_proto.rs | 20 +-
ballista/core/src/utils.rs | 121 +-
ballista/executor/Cargo.toml | 39 +-
ballista/executor/build.rs | 2 +-
ballista/executor/executor_config_spec.toml | 14 +-
ballista/executor/src/bin/main.rs | 87 +
ballista/executor/src/collect.rs | 8 +-
ballista/executor/src/execution_engine.rs | 121 +
ballista/executor/src/execution_loop.rs | 38 +-
ballista/executor/src/executor.rs | 95 +-
.../executor/src/{main.rs => executor_process.rs} | 225 +-
ballista/executor/src/executor_server.rs | 287 +-
ballista/executor/src/flight_service.rs | 41 +-
ballista/executor/src/lib.rs | 2 +
ballista/executor/src/metrics/mod.rs | 14 +-
ballista/executor/src/standalone.rs | 4 +-
ballista/scheduler/Cargo.toml | 38 +-
ballista/scheduler/build.rs | 4 +-
ballista/scheduler/scheduler_config_spec.toml | 53 +-
ballista/scheduler/src/api/handlers.rs | 7 +-
ballista/scheduler/src/api/mod.rs | 2 +-
ballista/scheduler/src/bin/main.rs | 151 +
ballista/scheduler/src/cluster/event/mod.rs | 318 +
ballista/scheduler/src/cluster/kv.rs | 857 ++
ballista/scheduler/src/cluster/memory.rs | 570 ++
ballista/scheduler/src/cluster/mod.rs | 441 +
.../src/{state/backend => cluster/storage}/etcd.rs | 26 +-
.../src/{state/backend => cluster/storage}/mod.rs | 76 +-
.../src/{state/backend => cluster/storage}/sled.rs | 74 +-
ballista/scheduler/src/cluster/test/mod.rs | 587 ++
ballista/scheduler/src/config.rs | 114 +-
ballista/scheduler/src/display.rs | 2 +-
ballista/scheduler/src/flight_sql.rs | 131 +-
ballista/scheduler/src/lib.rs | 2 +
ballista/scheduler/src/main.rs | 264 -
ballista/scheduler/src/metrics/mod.rs | 1 +
ballista/scheduler/src/metrics/prometheus.rs | 16 +-
ballista/scheduler/src/planner.rs | 192 +-
ballista/scheduler/src/scheduler_process.rs | 124 +
ballista/scheduler/src/scheduler_server/event.rs | 76 +
.../src/scheduler_server/external_scaler.rs | 2 +-
ballista/scheduler/src/scheduler_server/grpc.rs | 510 +-
ballista/scheduler/src/scheduler_server/mod.rs | 266 +-
.../src/scheduler_server/query_stage_scheduler.rs | 243 +-
ballista/scheduler/src/standalone.rs | 24 +-
ballista/scheduler/src/state/backend/memory.rs | 411 -
.../scheduler/src/state/backend/utils/oneshot.rs | 179 -
.../src/state/backend/utils/subscriber.rs | 248 -
ballista/scheduler/src/state/execution_graph.rs | 664 +-
.../src/state/execution_graph/execution_stage.rs | 45 +-
.../scheduler/src/state/execution_graph_dot.rs | 180 +-
ballista/scheduler/src/state/executor_manager.rs | 819 +-
ballista/scheduler/src/state/mod.rs | 456 +-
ballista/scheduler/src/state/session_manager.rs | 89 +-
ballista/scheduler/src/state/session_registry.rs | 69 -
ballista/scheduler/src/state/task_manager.rs | 397 +-
ballista/scheduler/src/test_utils.rs | 421 +-
ballista/scheduler/ui/.gitignore | 4 +-
ballista/scheduler/ui/README.md | 6 +-
ballista/scheduler/ui/package.json | 4 +-
ballista/scheduler/ui/yarn.lock | 8396 ++++++++++----------
benchmarks/Cargo.toml | 18 +-
benchmarks/src/bin/nyctaxi.rs | 15 +-
benchmarks/src/bin/tpch.rs | 450 +-
ci/scripts/rust_toml_fmt.sh | 2 +-
dev/build-ballista-docker.sh | 7 +-
...lista-rust.sh => build-ballista-executables.sh} | 4 +
dev/build-ui.sh | 23 -
dev/docker/ballista-builder.Dockerfile | 9 +-
...executor.Dockerfile => ballista-cli.Dockerfile} | 11 +-
.../{executor-entrypoint.sh => cli-entrypoint.sh} | 2 +-
dev/release/README.md | 182 +-
dev/release/create-tarball.sh | 2 +-
dev/release/update_change_log-ballista.sh | 2 +-
dev/release/update_change_log.sh | 2 +-
dev/update_ballista_versions.py | 1 -
docker-compose.yml | 2 +-
docs/developer/README.md | 11 -
docs/developer/architecture.md | 2 +-
docs/developer/configuration.md | 34 -
docs/developer/dev-env.md | 51 -
docs/developer/integration-testing.md | 29 -
docs/source/conf.py | 4 +-
docs/source/index.rst | 2 +-
.../source/user-guide/deployment/docker-compose.md | 4 +-
docs/source/user-guide/scheduler.md | 2 +-
...bench-h-workstation-10-distributed-perquery.png | Bin 0 -> 33223 bytes
examples/Cargo.toml | 16 +-
examples/examples/standalone-sql.rs | 5 +-
.../backend/utils/mod.rs => examples/src/lib.rs | 5 +-
examples/src/test_util.rs | 89 +
python/.cargo/config | 22 -
python/.dockerignore | 19 -
python/.gitignore | 20 -
python/CHANGELOG.md | 129 -
python/Cargo.toml | 54 -
python/LICENSE.txt | 202 -
python/README.md | 185 -
python/ballista/__init__.py | 113 -
python/ballista/functions.py | 23 -
python/ballista/tests/__init__.py | 16 -
python/ballista/tests/generic.py | 87 -
python/ballista/tests/test_aggregation.py | 48 -
python/ballista/tests/test_catalog.py | 72 -
python/ballista/tests/test_context.py | 63 -
python/ballista/tests/test_dataframe.py | 181 -
python/ballista/tests/test_functions.py | 219 -
python/ballista/tests/test_imports.py | 65 -
python/ballista/tests/test_sql.py | 250 -
python/ballista/tests/test_udaf.py | 136 -
python/pyproject.toml | 55 -
python/requirements-310.txt | 210 -
python/requirements-37.txt | 318 -
python/requirements.in | 27 -
python/requirements.txt | 282 -
python/src/ballista_context.rs | 137 -
python/src/catalog.rs | 127 -
python/src/dataframe.rs | 170 -
python/src/dataset.rs | 134 -
python/src/dataset_exec.rs | 282 -
python/src/datatype.rs | 39 -
python/src/errors.rs | 111 -
python/src/expression.rs | 138 -
python/src/functions.rs | 346 -
python/src/lib.rs | 73 -
python/src/pyarrow_filter_expression.rs | 219 -
python/src/udaf.rs | 150 -
python/src/udf.rs | 95 -
python/src/utils.rs | 47 -
170 files changed, 13561 insertions(+), 18489 deletions(-)
delete mode 100644 .github/workflows/python_build.yml
delete mode 100644 .github/workflows/python_test.yaml
delete mode 100644 ballista/core/src/serde/physical_plan/from_proto.rs
delete mode 100644 ballista/core/src/serde/physical_plan/mod.rs
delete mode 100644 ballista/core/src/serde/physical_plan/to_proto.rs
create mode 100644 ballista/executor/src/bin/main.rs
create mode 100644 ballista/executor/src/execution_engine.rs
rename ballista/executor/src/{main.rs => executor_process.rs} (75%)
create mode 100644 ballista/scheduler/src/bin/main.rs
create mode 100644 ballista/scheduler/src/cluster/event/mod.rs
create mode 100644 ballista/scheduler/src/cluster/kv.rs
create mode 100644 ballista/scheduler/src/cluster/memory.rs
create mode 100644 ballista/scheduler/src/cluster/mod.rs
rename ballista/scheduler/src/{state/backend => cluster/storage}/etcd.rs (94%)
rename ballista/scheduler/src/{state/backend => cluster/storage}/mod.rs (80%)
rename ballista/scheduler/src/{state/backend => cluster/storage}/sled.rs (85%)
create mode 100644 ballista/scheduler/src/cluster/test/mod.rs
delete mode 100644 ballista/scheduler/src/main.rs
create mode 100644 ballista/scheduler/src/scheduler_process.rs
delete mode 100644 ballista/scheduler/src/state/backend/memory.rs
delete mode 100644 ballista/scheduler/src/state/backend/utils/oneshot.rs
delete mode 100644 ballista/scheduler/src/state/backend/utils/subscriber.rs
delete mode 100644 ballista/scheduler/src/state/session_registry.rs
rename dev/{build-ballista-rust.sh => build-ballista-executables.sh} (79%)
delete mode 100755 dev/build-ui.sh
copy dev/docker/{ballista-executor.Dockerfile => ballista-cli.Dockerfile} (76%)
copy dev/docker/{executor-entrypoint.sh => cli-entrypoint.sh} (96%)
delete mode 100644 docs/developer/configuration.md
delete mode 100644 docs/developer/dev-env.md
delete mode 100644 docs/developer/integration-testing.md
create mode 100644 docs/sqlbench-h-workstation-10-distributed-perquery.png
rename ballista/scheduler/src/state/backend/utils/mod.rs =>
examples/src/lib.rs (90%)
create mode 100644 examples/src/test_util.rs
delete mode 100644 python/.cargo/config
delete mode 100644 python/.dockerignore
delete mode 100644 python/.gitignore
delete mode 100644 python/CHANGELOG.md
delete mode 100644 python/Cargo.toml
delete mode 100644 python/LICENSE.txt
delete mode 100644 python/README.md
delete mode 100644 python/ballista/__init__.py
delete mode 100644 python/ballista/functions.py
delete mode 100644 python/ballista/tests/__init__.py
delete mode 100644 python/ballista/tests/generic.py
delete mode 100644 python/ballista/tests/test_aggregation.py
delete mode 100644 python/ballista/tests/test_catalog.py
delete mode 100644 python/ballista/tests/test_context.py
delete mode 100644 python/ballista/tests/test_dataframe.py
delete mode 100644 python/ballista/tests/test_functions.py
delete mode 100644 python/ballista/tests/test_imports.py
delete mode 100644 python/ballista/tests/test_sql.py
delete mode 100644 python/ballista/tests/test_udaf.py
delete mode 100644 python/pyproject.toml
delete mode 100644 python/requirements-310.txt
delete mode 100644 python/requirements-37.txt
delete mode 100644 python/requirements.in
delete mode 100644 python/requirements.txt
delete mode 100644 python/src/ballista_context.rs
delete mode 100644 python/src/catalog.rs
delete mode 100644 python/src/dataframe.rs
delete mode 100644 python/src/dataset.rs
delete mode 100644 python/src/dataset_exec.rs
delete mode 100644 python/src/datatype.rs
delete mode 100644 python/src/errors.rs
delete mode 100644 python/src/expression.rs
delete mode 100644 python/src/functions.rs
delete mode 100644 python/src/lib.rs
delete mode 100644 python/src/pyarrow_filter_expression.rs
delete mode 100644 python/src/udaf.rs
delete mode 100644 python/src/udf.rs
delete mode 100644 python/src/utils.rs