Hi everyone,

I have a question for anyone with large datasets in HDFS about row group and HDFS block alignment: how well are row groups and HFDS blocks aligned in practice?

Say your row group size is equal to HDFS block size, then how many blocks does it take, on average, before a row group is significantly split between two blocks? Or put differently, how much shorter are row groups than the planned row group size, on average?

I'm trying to find out whether it would be a significant benefit to use something like variable-length HDFS blocks, added in HDFS-3689 [1], to keep the two aligned.

Thanks!

rb

[1]: https://issues.apache.org/jira/browse/HDFS-3689


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to