This is an automated email from the ASF dual-hosted git repository.
houqp pushed a change to branch arrow2
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git.
from c0c9c72 Officially maintained Arrow2 branch (#1556)
add 54da006 Stop merging avro schemas as it doesn't support list of lists
add 97415ca use the latest arrow2 with Chunk
add b42ebe7 Clarify docs about `Accumulator::update` and
`Accumulator::update_batch` (#1542)
add b05feda Mark ARRAY_AGG(DISTINCT ...) not implemented (#1534)
add 06d147a Add batch operations to stddev (#1547)
add e1e7b86 Address clippy warnings (#1553)
add 14176ff Update to arrow-7.0.0 (#1523)
add 794b92b Remove unused `update` and `merge` implementations from
Aggregates and supporting `ScalarValue` arithmetic (#1550)
add cf76969 Make call SchedulerServer::new once in ballista-scheduler
process (#1537)
add b4c77e5 Add covar operators (#1551)
add d7e465a Initial MemoryManager and DiskManager APIs for query
execution + External Sort implementation (#1526)
add 811bb51 Update to rust 1.58 (#1557)
add 0bddfb7 support cast/try_cast for decimal: signed numeric to decimal
(#1442)
add 1c39f5c support comparison for decimal data type and refactor the
binary coercion rule (#1483)
add b743610 add correlation function (#1561)
add 1dae7e2 Rename sql integration tests from `mod` to `sql_integration`
(#1575)
add bbfc2c0 update reference to python and update readme (#1581)
add 278e859 minor: improve the benchmark readme (#1567)
add 438b417 Tests for support try_cast/cast decimal to numeric (#1465)
add 6f7b2d2 implement Hash for various types and replace PartialOrd
(#1580)
add f027e5f add from_slice trait to ease arrow2 migration (#1588)
add 92a3e45 Consolidate `batch_size` configuration in `ExecutionConfig`,
`RuntimeConfig` and `PhysicalPlanConfig` (#1562)
add 30df911 support from_slice for binary, string, and boolean array
types (#1589)
add 059e52b update nightly version (#1597)
add 82e8003 remove update and merge (#1582)
add c549d51 support mathematics operation for decimal data type (#1554)
add fefbfc8 add test for decimal to decimal (#1603)
add 8ebc94c fix: casting Int64 to Float64 unsuccessfully caused tpch8 to
fail (#1601)
add 444c153 Add support show tables and show columns for ballista (#1593)
add ad392fd Fix comparison of dictionary arrays (#1606)
add 345f727 Replace Datafusion Error with Generic Error for Object store
(#1541)
add eb51fae consolidate binary_expr coercion rule code into
`binary_rule.rs` module (#1607)
add a96bb5e Implement ARRAY_AGG(DISTINCT ...) (#1579)
add d93cf79 Add roadmap to readme (#1616)
add 2f702e4 fix: sql planner creates cross join instead of inner join
from select predicates (#1566)
add e92225d feat: Support complex interval via IntervalMonthDayNano
(#1615)
add 03075d5 Fix null comparison for Parquet pruning predicate (#1595)
add 3c5a679 fix dependabot (#1625)
add 7d819d1 Consolidate sort and external_sort (#1596)
add 62edddb Optimize `SortPreservingMergeStream` to avoid `SortKeyCursor`
sharing (#1624)
add cc8f325 Update pyo3 requirement from 0.14 to 0.15 (#1627)
add 67a598c Update etcd-client requirement from 0.7 to 0.8 (#1626)
add 0762bf0 Update hashbrown requirement from 0.11 to 0.12 (#1631)
add af8786e support hash decimal array and group by (#1640)
add 1c63759 Add spill_count and spilled_bytes to baseline metrics, test
sort with spill of metrics (#1641)
add 15af24a Add `DataFusionError` -> `ArrowError` conversion (#1643)
add 9c5ccae update md-5, sha2, blake2 (#1647)
add 71757bb Introduce push-based task scheduling for Ballista (#1560)
add 4a2453a fix a cte block with same name for many times (#1639)
add deaa8ac Handle merging of evolved schemas in ParquetExec (#1622)
add 01b5244 refine match pattern related code (#1650)
add 6ec18bb Consolidate Schema and RecordBatch projection (#1638)
add 741df36 Remove DataFusionError::into_arrow_external_error (#1645)
add c63cfd4 Move AggregatedMetricsSet to metrics for further reuse (#1663)
add 97f95b3 Make `MemoryManager` and `MemoryStream` public (#1664)
add 618c1e8 feat: Support Substring(str [from int] [for int]) (#1621)
add 2a9df64 [Ballista] Fix scheduler state mod bug (#1655)
add 992624a Fix predicate pushdown for outer joins (#1618)
add 271b6ba feat: Support quarter granularity in date_trunct fn (#1667)
add 6c8d642 Update to arrow 8.0.0 (#1673)
add bf68073 [Ballista] Add Decimal128, Date64, TimestampSecond,
TimestampMillisecond, Interv… (#1659)
add ee91c68 upgrade clap to version 3 (#1672)
add 7153fac Improve configuration and resource use of `MemoryManager` and
`DiskManager` (#1668)
add bffa5e4 Use NamedTempFile rather than `String` in DiskManager (#1680)
add 48ad975 Add VegaFusion as project that uses DataFusion (#1683)
add d297540 Add a new metric type: `Gauge` + `CurrentMemoryUsage` to
metrics (#1682)
add fdbd608 enhance arithmetic operation for array with scalar (#1552)
add bf71577 refactor array_agg to not to have `update` and `merge` (#1681)
add 2266474 Fix bug while merging `RecordBatch`, add
`SortPreservingMerge` fuzz tester (#1678)
add 63d24bf Make `SortPreservingMergeStream` stable on input stream order
(#1687)
add 18918fa Merge branch 'master' into i_arrow2
add a7ec38e resolve up to last datafusion issue with SortColumn using a
reference that cannot be returned
add eea061f Fix other crates and address check warnings
add ab48bb2 clippy
add 98f98d1 test fix #1, errors and debug strings
No new revisions were added by this update.
Summary of changes:
.env | 2 +-
.github/dependabot.yml | 6 +-
.github/workflows/rust.yml | 2 +-
Cargo.toml | 4 +-
README.md | 68 +-
ballista-examples/Cargo.toml | 2 +-
ballista/rust/client/Cargo.toml | 4 +-
ballista/rust/client/src/columnar_batch.rs | 7 +-
ballista/rust/client/src/context.rs | 206 ++-
ballista/rust/core/Cargo.toml | 8 +-
ballista/rust/core/proto/ballista.proto | 128 +-
ballista/rust/core/src/client.rs | 11 +-
ballista/rust/core/src/config.rs | 82 +-
ballista/rust/core/src/error.rs | 2 +-
.../core/src/execution_plans/distributed_query.rs | 4 +-
.../core/src/execution_plans/shuffle_reader.rs | 10 +-
.../core/src/execution_plans/shuffle_writer.rs | 95 +-
.../core/src/execution_plans/unresolved_shuffle.rs | 7 +-
ballista/rust/core/src/lib.rs | 1 -
ballista/rust/core/src/memory_stream.rs | 92 --
.../rust/core/src/serde/logical_plan/from_proto.rs | 111 +-
ballista/rust/core/src/serde/logical_plan/mod.rs | 52 +-
.../rust/core/src/serde/logical_plan/to_proto.rs | 88 +-
ballista/rust/core/src/serde/mod.rs | 49 +-
.../core/src/serde/physical_plan/from_proto.rs | 19 +-
ballista/rust/core/src/serde/physical_plan/mod.rs | 3 +-
.../rust/core/src/serde/physical_plan/to_proto.rs | 12 +-
ballista/rust/core/src/serde/scheduler/mod.rs | 141 +++
ballista/rust/core/src/utils.rs | 22 +-
ballista/rust/executor/Cargo.toml | 5 +-
ballista/rust/executor/executor_config_spec.toml | 13 +
ballista/rust/executor/src/collect.rs | 11 +-
ballista/rust/executor/src/execution_loop.rs | 38 +-
ballista/rust/executor/src/executor.rs | 22 +-
ballista/rust/executor/src/executor_server.rs | 291 +++++
ballista/rust/executor/src/flight_service.rs | 12 +-
ballista/rust/executor/src/lib.rs | 39 +
ballista/rust/executor/src/main.rs | 65 +-
ballista/rust/executor/src/standalone.rs | 2 +
ballista/rust/scheduler/Cargo.toml | 4 +-
ballista/rust/scheduler/scheduler_config_spec.toml | 9 +-
ballista/rust/scheduler/src/lib.rs | 457 ++++++-
ballista/rust/scheduler/src/main.rs | 48 +-
ballista/rust/scheduler/src/planner.rs | 2 +-
ballista/rust/scheduler/src/standalone.rs | 6 +-
ballista/rust/scheduler/src/state/mod.rs | 235 +++-
ballista/rust/scheduler/src/test_utils.rs | 1 +
benchmarks/.gitignore | 2 +-
benchmarks/Cargo.toml | 4 +-
benchmarks/README.md | 2 +-
benchmarks/src/bin/nyctaxi.rs | 13 +-
benchmarks/src/bin/tpch.rs | 24 +-
ci/docker/linux-apt-lint.dockerfile | 2 +-
datafusion-cli/Cargo.toml | 6 +-
datafusion-cli/Dockerfile | 2 +-
datafusion-cli/src/command.rs | 3 +-
datafusion-cli/src/exec.rs | 3 +-
datafusion-cli/src/functions.rs | 15 +-
datafusion-cli/src/main.rs | 34 +-
datafusion-cli/src/print_format.rs | 23 +-
datafusion-cli/src/print_options.rs | 2 +-
datafusion-examples/Cargo.toml | 6 +-
.../examples/dataframe_in_memory.rs | 3 +-
datafusion-examples/examples/flight_client.rs | 8 +-
datafusion-examples/examples/flight_server.rs | 14 +-
datafusion-examples/examples/simple_udaf.rs | 64 +-
datafusion-examples/examples/simple_udf.rs | 6 +-
datafusion/Cargo.toml | 17 +-
datafusion/benches/data_utils/mod.rs | 4 +-
datafusion/benches/filter_query_sql.rs | 3 +-
datafusion/benches/math_query_sql.rs | 3 +-
datafusion/benches/physical_plan.rs | 14 +-
datafusion/benches/sort_limit_query_sql.rs | 10 +-
datafusion/src/arrow_print.rs | 5 +-
datafusion/src/avro_to_arrow/arrow_array_reader.rs | 15 +-
datafusion/src/avro_to_arrow/reader.rs | 4 +-
datafusion/src/avro_to_arrow/schema.rs | 1 +
datafusion/src/cast.rs | 52 +
datafusion/src/catalog/catalog.rs | 12 +
datafusion/src/catalog/information_schema.rs | 3 +-
datafusion/src/catalog/mod.rs | 1 +
datafusion/src/catalog/schema.rs | 7 +
datafusion/src/dataframe.rs | 2 +-
datafusion/src/datasource/datasource.rs | 1 -
datafusion/src/datasource/empty.rs | 15 +-
datafusion/src/datasource/file_format/avro.rs | 61 +-
datafusion/src/datasource/file_format/csv.rs | 49 +-
datafusion/src/datasource/file_format/json.rs | 30 +-
datafusion/src/datasource/file_format/mod.rs | 4 +-
datafusion/src/datasource/file_format/parquet.rs | 117 +-
datafusion/src/datasource/listing/helpers.rs | 3 +-
datafusion/src/datasource/listing/table.rs | 26 +-
datafusion/src/datasource/memory.rs | 62 +-
datafusion/src/datasource/mod.rs | 6 +-
datafusion/src/datasource/object_store/local.rs | 3 +-
datafusion/src/datasource/object_store/mod.rs | 10 +-
datafusion/src/error.rs | 84 +-
datafusion/src/execution/context.rs | 232 +++-
datafusion/src/execution/dataframe_impl.rs | 14 +-
datafusion/src/execution/disk_manager.rs | 165 +++
datafusion/src/execution/memory_manager.rs | 601 +++++++++
datafusion/src/execution/mod.rs | 6 +
datafusion/src/execution/options.rs | 6 +
datafusion/src/execution/runtime_env.rs | 138 ++
datafusion/src/field_util.rs | 381 +++++-
datafusion/src/lib.rs | 16 +-
datafusion/src/logical_plan/builder.rs | 10 +-
datafusion/src/logical_plan/dfschema.rs | 42 +-
datafusion/src/logical_plan/display.rs | 1 +
datafusion/src/logical_plan/expr.rs | 96 +-
datafusion/src/logical_plan/extension.rs | 2 +-
datafusion/src/logical_plan/plan.rs | 6 +-
.../src/optimizer/common_subexpr_eliminate.rs | 6 +
datafusion/src/optimizer/eliminate_limit.rs | 1 +
datafusion/src/optimizer/filter_push_down.rs | 465 ++++---
datafusion/src/optimizer/limit_push_down.rs | 1 +
datafusion/src/optimizer/mod.rs | 1 +
datafusion/src/optimizer/projection_push_down.rs | 5 +-
datafusion/src/optimizer/simplify_expressions.rs | 45 +-
.../src/optimizer/single_distinct_to_groupby.rs | 1 +
datafusion/src/optimizer/utils.rs | 78 +-
.../src/physical_optimizer/aggregate_statistics.rs | 10 +-
.../src/physical_optimizer/coalesce_batches.rs | 3 +-
.../physical_optimizer/hash_build_probe_order.rs | 2 +
datafusion/src/physical_optimizer/merge_exec.rs | 1 +
datafusion/src/physical_optimizer/pruning.rs | 77 +-
datafusion/src/physical_optimizer/repartition.rs | 10 +-
datafusion/src/physical_plan/aggregates.rs | 273 +++-
datafusion/src/physical_plan/analyze.rs | 16 +-
datafusion/src/physical_plan/coalesce_batches.rs | 46 +-
.../src/physical_plan/coalesce_partitions.rs | 24 +-
.../physical_plan/coercion_rule/aggregate_rule.rs | 30 +-
.../src/physical_plan/coercion_rule/binary_rule.rs | 625 +++++++++
datafusion/src/physical_plan/coercion_rule/mod.rs | 2 +
datafusion/src/physical_plan/common.rs | 96 +-
datafusion/src/physical_plan/cross_join.rs | 17 +-
datafusion/src/physical_plan/crypto_expressions.rs | 10 +-
.../src/physical_plan/datetime_expressions.rs | 61 +-
datafusion/src/physical_plan/empty.rs | 21 +-
datafusion/src/physical_plan/explain.rs | 17 +-
.../physical_plan/expressions/approx_distinct.rs | 17 -
.../src/physical_plan/expressions/array_agg.rs | 64 +-
.../src/physical_plan/expressions/average.rs | 26 +-
datafusion/src/physical_plan/expressions/binary.rs | 1326 +++++++++++++++++---
datafusion/src/physical_plan/expressions/case.rs | 52 +-
datafusion/src/physical_plan/expressions/cast.rs | 399 +++++-
.../src/physical_plan/expressions/coercion.rs | 247 ----
datafusion/src/physical_plan/expressions/column.rs | 7 +-
.../src/physical_plan/expressions/correlation.rs | 544 ++++++++
datafusion/src/physical_plan/expressions/count.rs | 21 +-
.../src/physical_plan/expressions/covariance.rs | 725 +++++++++++
.../src/physical_plan/expressions/cume_dist.rs | 3 +-
.../{ => expressions}/distinct_expressions.rs | 320 ++++-
.../physical_plan/expressions/get_indexed_field.rs | 8 +-
.../src/physical_plan/expressions/in_list.rs | 10 +-
.../src/physical_plan/expressions/is_not_null.rs | 9 +-
.../src/physical_plan/expressions/is_null.rs | 8 +-
.../src/physical_plan/expressions/lead_lag.rs | 9 +-
.../src/physical_plan/expressions/literal.rs | 7 +-
.../src/physical_plan/expressions/min_max.rs | 23 +-
datafusion/src/physical_plan/expressions/mod.rs | 65 +-
.../src/physical_plan/expressions/negative.rs | 7 +-
datafusion/src/physical_plan/expressions/not.rs | 3 +-
.../src/physical_plan/expressions/nth_value.rs | 5 +-
datafusion/src/physical_plan/expressions/nullif.rs | 4 +-
datafusion/src/physical_plan/expressions/rank.rs | 25 +-
.../src/physical_plan/expressions/row_number.rs | 5 +-
datafusion/src/physical_plan/expressions/stddev.rs | 113 +-
datafusion/src/physical_plan/expressions/sum.rs | 14 +-
.../src/physical_plan/expressions/try_cast.rs | 320 ++++-
.../src/physical_plan/expressions/variance.rs | 240 ++--
datafusion/src/physical_plan/file_format/avro.rs | 46 +-
datafusion/src/physical_plan/file_format/csv.rs | 53 +-
.../src/physical_plan/file_format/file_stream.rs | 2 +-
datafusion/src/physical_plan/file_format/json.rs | 36 +-
datafusion/src/physical_plan/file_format/mod.rs | 17 +-
.../src/physical_plan/file_format/parquet.rs | 490 +++++++-
datafusion/src/physical_plan/filter.rs | 26 +-
datafusion/src/physical_plan/functions.rs | 17 +-
datafusion/src/physical_plan/hash_aggregate.rs | 86 +-
datafusion/src/physical_plan/hash_join.rs | 77 +-
datafusion/src/physical_plan/hash_utils.rs | 61 +-
datafusion/src/physical_plan/join_utils.rs | 1 +
datafusion/src/physical_plan/limit.rs | 27 +-
datafusion/src/physical_plan/memory.rs | 61 +-
datafusion/src/physical_plan/metrics/aggregated.rs | 155 +++
datafusion/src/physical_plan/metrics/baseline.rs | 39 +-
datafusion/src/physical_plan/metrics/builder.rs | 49 +-
datafusion/src/physical_plan/metrics/mod.rs | 24 +-
datafusion/src/physical_plan/metrics/value.rs | 139 +-
datafusion/src/physical_plan/mod.rs | 130 +-
datafusion/src/physical_plan/planner.rs | 14 +-
datafusion/src/physical_plan/projection.rs | 34 +-
datafusion/src/physical_plan/regex_expressions.rs | 2 +-
datafusion/src/physical_plan/repartition.rs | 55 +-
datafusion/src/physical_plan/sort.rs | 565 ---------
datafusion/src/physical_plan/sorts/mod.rs | 312 +++++
datafusion/src/physical_plan/sorts/sort.rs | 907 +++++++++++++
.../{ => sorts}/sort_preserving_merge.rs | 721 ++++++-----
datafusion/src/physical_plan/stream.rs | 5 +-
datafusion/src/physical_plan/type_coercion.rs | 1 +
datafusion/src/physical_plan/udaf.rs | 2 +-
datafusion/src/physical_plan/udf.rs | 2 +-
.../src/physical_plan/unicode_expressions.rs | 8 +-
datafusion/src/physical_plan/union.rs | 31 +-
datafusion/src/physical_plan/values.rs | 12 +-
datafusion/src/physical_plan/window_functions.rs | 2 +-
datafusion/src/physical_plan/windows/aggregate.rs | 2 +-
datafusion/src/physical_plan/windows/built_in.rs | 2 +-
datafusion/src/physical_plan/windows/mod.rs | 15 +-
.../src/physical_plan/windows/window_agg_exec.rs | 19 +-
datafusion/src/record_batch.rs | 432 +++++++
datafusion/src/scalar.rs | 639 +---------
datafusion/src/sql/planner.rs | 205 ++-
datafusion/src/test/exec.rs | 41 +-
datafusion/src/test/mod.rs | 11 +-
datafusion/src/test/variable.rs | 2 +
datafusion/src/test_util.rs | 1 +
datafusion/tests/custom_sources.rs | 29 +-
datafusion/tests/dataframe.rs | 7 +-
datafusion/tests/dataframe_functions.rs | 5 +-
datafusion/tests/merge_fuzz.rs | 222 ++++
datafusion/tests/parquet_pruning.rs | 6 +-
datafusion/tests/path_partition.rs | 1 +
datafusion/tests/provider_filter_pushdown.rs | 15 +-
datafusion/tests/sql/aggregates.rs | 90 +-
datafusion/tests/sql/avro.rs | 5 +-
datafusion/tests/sql/errors.rs | 3 +-
datafusion/tests/sql/explain_analyze.rs | 20 +-
datafusion/tests/sql/expr.rs | 30 +
datafusion/tests/sql/joins.rs | 229 +++-
datafusion/tests/sql/mod.rs | 53 +-
datafusion/tests/sql/parquet.rs | 6 +-
datafusion/tests/sql/select.rs | 89 +-
datafusion/tests/sql/timestamp.rs | 4 +-
datafusion/tests/sql_integration.rs | 1 +
datafusion/tests/statistics.rs | 27 +-
datafusion/tests/user_defined_plan.rs | 17 +-
dev/docker/ballista-base.dockerfile | 2 +-
239 files changed, 14188 insertions(+), 3895 deletions(-)
create mode 100644 ballista/rust/executor/src/executor_server.rs
create mode 100644 datafusion/src/avro_to_arrow/schema.rs
create mode 100644 datafusion/src/cast.rs
create mode 100644 datafusion/src/execution/disk_manager.rs
create mode 100644 datafusion/src/execution/memory_manager.rs
create mode 100644 datafusion/src/execution/runtime_env.rs
create mode 100644 datafusion/src/physical_plan/coercion_rule/binary_rule.rs
delete mode 100644 datafusion/src/physical_plan/expressions/coercion.rs
create mode 100644 datafusion/src/physical_plan/expressions/correlation.rs
create mode 100644 datafusion/src/physical_plan/expressions/covariance.rs
rename datafusion/src/physical_plan/{ => expressions}/distinct_expressions.rs
(70%)
create mode 100644 datafusion/src/physical_plan/metrics/aggregated.rs
create mode 100644 datafusion/src/physical_plan/sorts/mod.rs
create mode 100644 datafusion/src/physical_plan/sorts/sort.rs
rename datafusion/src/physical_plan/{ => sorts}/sort_preserving_merge.rs (67%)
create mode 100644 datafusion/src/record_batch.rs
create mode 100644 datafusion/tests/merge_fuzz.rs
create mode 100644 datafusion/tests/sql_integration.rs