Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 11.0.0 RC1

Remzi Yang Wed, 17 Aug 2022 20:44:48 -0700

Sorry, my mistake, I forgot updating the verification script.

On Wed, 17 Aug 2022 at 16:46, Andrew Lamb <[email protected]> wrote:


> This looks similar to [1]
>
> Do you by any chance have the ARROW_TEST_DATA environment set? If so I
> think it needs to end with a `/` or be unset to run the script.
>
> The difference is there is something wrong with the normalization:
>
> Expected:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
>
> Actual
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
>
> Andrew
>
> [1] https://github.com/apache/arrow-datafusion/issues/2719
>
> On Tue, Aug 16, 2022 at 9:18 PM Remzi Yang <[email protected]>
> wrote:
>
> > Some tests failed. Verified on M1 Mac.
> >
> > failures:
> >
> >
> > ---- sql::explain_analyze::csv_explain stdout ----
> >
> > thread 'sql::explain_analyze::csv_explain' panicked at 'assertion failed:
> > `(left == right)`
> >
> >   left: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter:
> > #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100
> > projection=[c1, c2], partial_filters=[#aggregate_test_100.c2 >
> > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n
> > CoalesceBatchesExec:
> > target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n
> >  RepartitionExec:
> > partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1, c2]\n"]]`,
> >
> >  right: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter:
> > #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100
> > projection=[c1, c2], partial_filters=[#aggregate_test_100.c2 >
> > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n
> > CoalesceBatchesExec:
> > target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n
> >  RepartitionExec:
> > partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1, c2]\n"]]`',
> > datafusion/core/tests/sql/explain_analyze.rs:769:5
> >
> >
> > ----
> sql::explain_analyze::test_physical_plan_display_indent_multi_children
> > stdout ----
> >
> > thread
> > 'sql::explain_analyze::test_physical_plan_display_indent_multi_children'
> > panicked at 'assertion failed: `(left == right)`
> >
> >   left: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec:
> > target_batch_size=4096", "    HashJoinExec: mode=Partitioned,
> > join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name:
> > \"c2\", index: 0 })]", "      CoalesceBatchesExec:
> target_batch_size=4096",
> > "        RepartitionExec: partitioning=Hash([Column { name: \"c1\",
> index:
> > 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "
> >    ProjectionExec:
> > expr=[c1@0 as c1]", "              RepartitionExec:
> > partitioning=RoundRobinBatch(9000)", "                CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1]", "      CoalesceBatchesExec:
> > target_batch_size=4096", "        RepartitionExec:
> > partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "
> >    ProjectionExec:
> > expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "
> >         RepartitionExec: partitioning=RoundRobinBatch(9000)", "
> >     CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv],
> > has_header=true, limit=None, projection=[c1]"]`,
> >
> >  right: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec:
> > target_batch_size=4096", "    HashJoinExec: mode=Partitioned,
> > join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name:
> > \"c2\", index: 0 })]", "      CoalesceBatchesExec:
> target_batch_size=4096",
> > "        RepartitionExec: partitioning=Hash([Column { name: \"c1\",
> index:
> > 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "
> >    ProjectionExec:
> > expr=[c1@0 as c1]", "              RepartitionExec:
> > partitioning=RoundRobinBatch(9000)", "                CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1]", "      CoalesceBatchesExec:
> > target_batch_size=4096", "        RepartitionExec:
> > partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "
> >    ProjectionExec:
> > expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "
> >         RepartitionExec: partitioning=RoundRobinBatch(9000)", "
> >     CsvExec: files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> > has_header=true, limit=None, projection=[c1]"]`: expected:
> >
> > [
> >
> >     "ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "  CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column {
> > name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
> >
> >     "      CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "        RepartitionExec: partitioning=Hash([Column { name: \"c1\",
> > index: 0 }], 9000)",
> >
> >     "          ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "            ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
> >
> >     "                CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1]",
> >
> >     "      CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "        RepartitionExec: partitioning=Hash([Column { name: \"c2\",
> > index: 0 }], 9000)",
> >
> >     "          ProjectionExec: expr=[c2@0 as c2]",
> >
> >     "            ProjectionExec: expr=[c1@0 as c2]",
> >
> >     "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
> >
> >     "                CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1]",
> >
> > ]
> >
> > actual:
> >
> >
> > [
> >
> >     "ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "  CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column {
> > name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
> >
> >     "      CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "        RepartitionExec: partitioning=Hash([Column { name: \"c1\",
> > index: 0 }], 9000)",
> >
> >     "          ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "            ProjectionExec: expr=[c1@0 as c1]",
> >
> >     "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
> >
> >     "                CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1]",
> >
> >     "      CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "        RepartitionExec: partitioning=Hash([Column { name: \"c2\",
> > index: 0 }], 9000)",
> >
> >     "          ProjectionExec: expr=[c2@0 as c2]",
> >
> >     "            ProjectionExec: expr=[c1@0 as c2]",
> >
> >     "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
> >
> >     "                CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1]",
> >
> > ]
> >
> > ', datafusion/core/tests/sql/explain_analyze.rs:734:5
> >
> >
> > ---- sql::explain_analyze::test_physical_plan_display_indent stdout ----
> >
> > thread 'sql::explain_analyze::test_physical_plan_display_indent' panicked
> > at 'assertion failed: `(left == right)`
> >
> >   left: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2
> > DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0
> as
> > c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12),
> > MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec:
> > mode=FinalPartitioned, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12),
> > MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec:
> > target_batch_size=4096", "            RepartitionExec:
> > partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "
> >   AggregateExec: mode=Partial, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "
> >         CoalesceBatchesExec: target_batch_size=4096", "
> >   FilterExec:
> > c12@1 < CAST(10 AS Float64)", "                    RepartitionExec:
> > partitioning=RoundRobinBatch(9000)", "                      CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1, c12]"]`,
> >
> >  right: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2
> > DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0
> as
> > c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12),
> > MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec:
> > mode=FinalPartitioned, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12),
> > MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec:
> > target_batch_size=4096", "            RepartitionExec:
> > partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "
> >   AggregateExec: mode=Partial, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "
> >         CoalesceBatchesExec: target_batch_size=4096", "
> >   FilterExec:
> > c12@1 < CAST(10 AS Float64)", "                    RepartitionExec:
> > partitioning=RoundRobinBatch(9000)", "                      CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1, c12]"]`: expected:
> >
> > [
> >
> >     "GlobalLimitExec: skip=None, fetch=10",
> >
> >     "  SortExec: [the_min@2 DESC]",
> >
> >     "    CoalescePartitionsExec",
> >
> >     "      ProjectionExec: expr=[c1@0 as c1,
> MAX(aggregate_test_100.c12)@1
> > as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as
> the_min]",
> >
> >     "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
> >
> >     "          CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "            RepartitionExec: partitioning=Hash([Column { name:
> \"c1\",
> > index: 0 }], 9000)",
> >
> >     "              AggregateExec: mode=Partial, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
> >
> >     "                CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "                  FilterExec: c12@1 < CAST(10 AS Float64)",
> >
> >     "                    RepartitionExec:
> > partitioning=RoundRobinBatch(9000)",
> >
> >     "                      CsvExec:
> > files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true,
> > limit=None, projection=[c1, c12]",
> >
> > ]
> >
> > actual:
> >
> >
> > [
> >
> >     "GlobalLimitExec: skip=None, fetch=10",
> >
> >     "  SortExec: [the_min@2 DESC]",
> >
> >     "    CoalescePartitionsExec",
> >
> >     "      ProjectionExec: expr=[c1@0 as c1,
> MAX(aggregate_test_100.c12)@1
> > as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as
> the_min]",
> >
> >     "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
> >
> >     "          CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "            RepartitionExec: partitioning=Hash([Column { name:
> \"c1\",
> > index: 0 }], 9000)",
> >
> >     "              AggregateExec: mode=Partial, gby=[c1@0 as c1],
> > aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
> >
> >     "                CoalesceBatchesExec: target_batch_size=4096",
> >
> >     "                  FilterExec: c12@1 < CAST(10 AS Float64)",
> >
> >     "                    RepartitionExec:
> > partitioning=RoundRobinBatch(9000)",
> >
> >     "                      CsvExec:
> > files=[privateARROW_TEST_DATA/csv/aggregate_test_100.csv],
> has_header=true,
> > limit=None, projection=[c1, c12]",
> >
> > ]
> >
> > ', datafusion/core/tests/sql/explain_analyze.rs:683:5
> >
> >
> >
> > failures:
> >
> >     sql::explain_analyze::csv_explain
> >
> >     sql::explain_analyze::test_physical_plan_display_indent
> >
> >
>  sql::explain_analyze::test_physical_plan_display_indent_multi_children
> >
> >
> > test result: FAILED. 459 passed; 3 failed; 2 ignored; 0 measured; 0
> > filtered out; finished in 2.76s
> >
> >
> > error: test failed, to rerun pass '-p datafusion --test sql_integration'
> >
> > + cleanup
> >
> > + '[' no = yes ']'
> >
> > On Wed, 17 Aug 2022 at 04:36, Ian Joiner <[email protected]> wrote:
> >
> > > Never mind the PS in the last email haha.
> > >
> > > On Tue, Aug 16, 2022 at 1:16 PM Ian Joiner <[email protected]>
> > wrote:
> > >
> > > > +1 (Non-binding)
> > > >
> > > > Verified on macOS 12.2.1 / Apple M1 Pro
> > > >
> > > > P.S. If verified with zsh instead of bash we got a command not found
> > for
> > > shasum
> > > > -a 256 -c on verify_dir_artifact_signatures:10 unless shasums are
> > > > disabled which has been happening for a while. Not sure whether we
> want
> > > to
> > > > fix this. If so I will file a PR for that.
> > > >
> > > > On Tue, Aug 16, 2022 at 12:15 PM Andy Grove <[email protected]>
> > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I would like to propose a release of Apache Arrow DataFusion
> > > >> Implementation,
> > > >> version 11.0.0.
> > > >>
> > > >> This release candidate is based on commit:
> > > >> 8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1 [1]
> > > >> The proposed release tarball and signatures are hosted at [2].
> > > >> The changelog is located at [3].
> > > >>
> > > >> Please download, verify checksums and signatures, run the unit
> tests,
> > > and
> > > >> vote
> > > >> on the release. The vote will be open for at least 72 hours.
> > > >>
> > > >> Only votes from PMC members are binding, but all members of the
> > > community
> > > >> are
> > > >> encouraged to test the release and vote with "(non-binding)".
> > > >>
> > > >> The standard verification procedure is documented at
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > >> .
> > > >>
> > > >> [ ] +1 Release this as Apache Arrow DataFusion 11.0.0
> > > >> [ ] +0
> > > >> [ ] -1 Do not release this as Apache Arrow DataFusion 11.0.0
> > because...
> > > >>
> > > >> Here is my vote:
> > > >>
> > > >> +1
> > > >>
> > > >> [1]:
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1
> > > >> [2]:
> > > >>
> > > >>
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-11.0.0-rc1
> > > >> [3]:
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1/CHANGELOG.md
> > > >>
> > > >
> > >
> >
>

Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 11.0.0 RC1

Reply via email to