tustvold commented on code in PR #1716:
URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878677472
##########
parquet/src/arrow/schema.rs:
##########
@@ -155,24 +100,24 @@ fn get_arrow_schema_from_metadata(encoded_meta: &str) ->
Result<Schema> {
Ok(message) => message
.header_as_schema()
.map(arrow::ipc::convert::fb_to_schema)
- .ok_or(ArrowError("the message is not Arrow
Schema".to_string())),
+ .ok_or(arrow_err!("the message is not Arrow Schema")),
Review Comment:
Drive by cleanup to move to arrow_err! macro
##########
parquet/src/schema/types.rs:
##########
@@ -847,13 +847,13 @@ pub struct SchemaDescriptor {
// `schema` in DFS order.
leaves: Vec<ColumnDescPtr>,
- // Mapping from a leaf column's index to the root column type that it
+ // Mapping from a leaf column's index to the root column index that it
// comes from. For instance: the leaf `a.b.c.d` would have a link back to
`a`:
// -- a <-----+
// -- -- b |
// -- -- -- c |
// -- -- -- -- d
- leaf_to_base: Vec<TypePtr>,
+ leaf_to_base: Vec<usize>,
Review Comment:
This change to store the root index, as opposed to a copy of the root ptr
makes it easier to convert from a root mask to a leaf mask
##########
parquet/src/arrow/mod.rs:
##########
@@ -133,11 +140,71 @@ pub use self::arrow_reader::ParquetFileArrowReader;
pub use self::arrow_writer::ArrowWriter;
#[cfg(feature = "async")]
pub use self::async_reader::ParquetRecordBatchStreamBuilder;
+use crate::schema::types::SchemaDescriptor;
pub use self::schema::{
arrow_to_parquet_schema, parquet_to_arrow_schema,
parquet_to_arrow_schema_by_columns,
- parquet_to_arrow_schema_by_root_columns,
};
/// Schema metadata key used to store serialized Arrow IPC schema
pub const ARROW_SCHEMA_META_KEY: &str = "ARROW:schema";
+
+/// A [`ProjectionMask`] identifies a set of columns within a potentially
nested schema to project
+#[derive(Debug, Clone)]
+pub struct ProjectionMask {
+ /// A mask of
+ mask: Option<Vec<bool>>,
+}
+
+impl ProjectionMask {
+ /// Create a [`ColumnMask`] which selects all columns
+ pub fn all() -> Self {
+ Self { mask: None }
+ }
+
+ /// Create a [`ColumnMask`] which selects only the specified leaf columns
+ ///
+ /// Note: repeated or out of order indices will not impact the final mask
+ ///
+ /// i.e. `[0, 1, 2]` will construct the same mask as `[1, 0, 0, 2]`
+ pub fn leaves(
+ schema: &SchemaDescriptor,
Review Comment:
The mask could theoretically carry this along and use it as a sanity check,
but I'm inclined to think if a user constructs a mask and then uses it on a
different schema, it's not something we can reasonably be expected to handle
sensibly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]