[
https://issues.apache.org/jira/browse/HBASE-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser resolved HBASE-18586.
--------------------------------
Resolution: Invalid
Please ask questions such as these on the [email protected]. This JIRA
instance is reserved for concrete code changes, not user support. Thanks.
> Multiple column families - scan performance
> -------------------------------------------
>
> Key: HBASE-18586
> URL: https://issues.apache.org/jira/browse/HBASE-18586
> Project: HBase
> Issue Type: Bug
> Components: scan
> Reporter: PS0618
>
> I have 2 HBase tables - one with a single column family, and other has 4
> column families. Both tables are keyed by same rowkey, and the column
> families all have a single column qualifier each, with a json string as value
> (each json payload is about 10-20K in size). All column families use
> fast-diff encoding and gzip compression.
> After loading about 60MM rows to each table, a scan test on (any) single
> column family in the 2nd table takes 4x the time to scan the single column
> family from the 1st table. In both cases, the scanner is bounded by a start
> and stop key to scan 1MM rows. Performance did not change much even after
> running a major compaction on both tables.
> Though HBase doc and other tech forums recommend not using more than 1 column
> family per table, nothing I have read so far suggests scan performance will
> linearly degrade based on number of column families. Has anyone else
> experienced this, and is there a simple explanation for this?
> To note, the reason second table has 4 column families is even though I only
> scan one column family at a time now, there are requirements to scan multiple
> column families from that table given a set of rowkeys.
> Thanks for any insight into the performance question.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)