Sorry I meant scan caching. (not batching)
________________________________ From: lars hofhansl <[email protected]> To: "[email protected]" <[email protected]>; "[email protected]" <[email protected]> Sent: Friday, January 25, 2013 2:00 PM Subject: Re: Hbase scans taking a lot of time Enable scan batching in Hive. You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency. -- Lars ________________________________ From: Vibhav Mundra <[email protected]> To: [email protected]; [email protected] Sent: Friday, January 25, 2013 1:10 AM Subject: Hbase scans taking a lot of time I am facing a very strange problem with HBase. This what I did: a) Create a table, using pre partioned splits. b) Also the column familes are zipped with lzo compression. c) Using the above configuration I am able to populate 2 million row per min in the Hbase. d) I have created a table with 300 million odd rows, which roughy took me 3 hours to populate and the data size is of 25GB. e) But when I query for data the performance I am getting is very bad. Basically this is what I am seeing: High CPU, no disk I/O and network I/O is happening at the rate of 6~7MB secs. Because of this, if I scan the entries of the table using Hive it is taking ages. Basically it is taking around 24 hours to scan the table. Any idea, of how to debug. -Vibhav
