benibus commented on code in PR #35727: URL: https://github.com/apache/arrow/pull/35727#discussion_r1233439604
########## cpp/src/arrow/compute/kernels/vector_sort_internal.h: ########## @@ -737,17 +757,32 @@ struct ResolvedTableSortKey { static Result<std::vector<ResolvedTableSortKey>> Make( const Table& table, const RecordBatchVector& batches, const std::vector<SortKey>& sort_keys) { - auto factory = [&](const SortField& f) { - const auto& type = table.schema()->field(f.field_index)->type(); + auto factory = [&](const SortField& f) -> Result<ResolvedTableSortKey> { + std::shared_ptr<DataType> type; + int64_t null_count = 0; + ArrayVector chunks; + chunks.reserve(batches.size()); + // We must expose a homogenous chunking for all ResolvedSortKey, - // so we can't simply pass `table.column(f.field_index)` - ArrayVector chunks(batches.size()); - std::transform(batches.begin(), batches.end(), chunks.begin(), - [&](const std::shared_ptr<RecordBatch>& batch) { - return batch->column(f.field_index); - }); - return ResolvedTableSortKey(type, std::move(chunks), f.order, - table.column(f.field_index)->null_count()); + // so we can't simply access the column from the table directly. + if (f.is_nested()) { + ARROW_ASSIGN_OR_RAISE(auto schema_field, f.path.Get(*table.schema())); + type = schema_field->type(); + for (const auto& batch : batches) { + ARROW_ASSIGN_OR_RAISE(auto child, f.path.GetFlattened(*batch)); + null_count += child->null_count(); + chunks.push_back(std::move(child)); + } + } else { + null_count = table.column(f.path[0])->null_count(); Review Comment: No, probably not a meaningful difference in this case (`GetFlattened` should be mostly trivial for top-level fields). Removed the non-nested branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org