emkornfield commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1610705909
##########
src/main/thrift/parquet.thrift:
##########
@@ -835,6 +864,65 @@ struct ColumnMetaData {
16: optional SizeStatistics size_statistics;
}
+struct ColumnChunkMetaDataV3 {
Review Comment:
> Well, ignoring the fact that Parquet is currently not a sparse format,
your proposal implies that readers have to do a O(n) search to find a given
column?
IIUC, Finding a column via schema elements today is also O(N) assuming no
nesting. I think the difference is today the first thing implementations do
create an efficient dictionary structure to amortize lookup of further columns.
I think if we want fast lookups without building any additional dictionaries
in memory we should be considering a new stored index structure (or reconsider
how we organize schema elements instead of a straight BFS).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]