mapleFU commented on code in PR #37400:
URL: https://github.com/apache/arrow/pull/37400#discussion_r2158222425
##########
cpp/src/parquet/metadata.h:
##########
@@ -505,21 +504,40 @@ class PARQUET_EXPORT RowGroupMetaDataBuilder {
std::unique_ptr<RowGroupMetaDataBuilderImpl> impl_;
};
+/// Alias type of page index location of a row group. The index location
+/// is located by column ordinal. If a column does not have a page index,
+/// its value is set to std::nullopt.
+using RowGroupIndexLocation = std::vector<std::optional<IndexLocation>>;
+
+/// Alias type of bloom filter location of a row group. The filter location
+/// is located by column ordinal.
+///
+/// Number of columns with a bloom filter to be relatively small compared to
+/// the number of overall columns, so map is used.
+using RowGroupBloomFilterLocation = std::map<int32_t, IndexLocation>;
+
+/// Alias type of page index and location of a parquet file. The
+/// index location is located by the row group ordinal.
+using FileIndexLocation = std::map<size_t, RowGroupIndexLocation>;
+
+/// Alias type of bloom filter and location of a parquet file. The
+/// index location is located by the row group ordinal.
+using FileBloomFilterLocation = std::map<size_t, RowGroupBloomFilterLocation>;
Review Comment:
> Why std::map instead of std::unordered_map?
Both ok to me, here I just want when flushing, the bloom filter could be
flush in order, but this is not required.
> Also, does the two-level map really make sense?
Maybe some bloom filter has bad equality, like a final row-group has only 10
rows. It's regarded as "bad equality"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]