[ https://issues.apache.org/jira/browse/HBASE-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gavin updated HBASE-18586: -------------------------- Comment: was deleted (was: A comment with security level 'jira-users' was removed.) > Multiple column families - scan performance > ------------------------------------------- > > Key: HBASE-18586 > URL: https://issues.apache.org/jira/browse/HBASE-18586 > Project: HBase > Issue Type: Bug > Components: scan > Reporter: PS0618 > Priority: Major > > I have 2 HBase tables - one with a single column family, and other has 4 > column families. Both tables are keyed by same rowkey, and the column > families all have a single column qualifier each, with a json string as value > (each json payload is about 10-20K in size). All column families use > fast-diff encoding and gzip compression. > After loading about 60MM rows to each table, a scan test on (any) single > column family in the 2nd table takes 4x the time to scan the single column > family from the 1st table. In both cases, the scanner is bounded by a start > and stop key to scan 1MM rows. Performance did not change much even after > running a major compaction on both tables. > Though HBase doc and other tech forums recommend not using more than 1 column > family per table, nothing I have read so far suggests scan performance will > linearly degrade based on number of column families. Has anyone else > experienced this, and is there a simple explanation for this? > To note, the reason second table has 4 column families is even though I only > scan one column family at a time now, there are requirements to scan multiple > column families from that table given a set of rowkeys. > Thanks for any insight into the performance question. -- This message was sent by Atlassian JIRA (v7.6.3#76005)