If thats all you get, a thread dump on 10.242.71.191:60020 would help ($ kill -QUIT RS_PID). Thread dump will be in the RS .out file. Do a few. Paste to pastebin. Thanks Ananth, St.Ack
On Wed, Oct 21, 2009 at 8:05 AM, Ananth T. Sarathy < [email protected]> wrote: > yeah, > I just don't understand why getScanner("Column:X") returns the iterator > and > process them yet getScanner("Column:Y") just spins and spins, yet Column:Y > is a much denser result. > > When I load from shell > > *Version: 0.20.0, r810752, Thu Sep 3 00:06:18 PDT 2009 > hbase(main):001:0> count 'GS_Applications' > 09/10/21 11:04:48 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode > /hbase/root-region-server got 10.244.9.171:60020 > 09/10/21 11:04:48 DEBUG client.HConnectionManager$ClientZKWatcher: Got > ZooKeeper event, state: SyncConnected, type: None, path: null > 09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Found ROOT > at 10.244.9.171:60020 > 09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Cached > location address: 10.245.82.160:60020, regioninfo: REGION => {NAME => > '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => > {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', > FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => > 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', > BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => > 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', > BLOCKCACHE => 'false'}]}} > 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached > location address: 10.242.71.191:60020, regioninfo: REGION => {NAME => > 'GS_Applications,,1255020109210', STARTKEY => '', ENDKEY => '', ENCODED => > 1732076772, TABLE => {{NAME => 'GS_Applications', FAMILIES => [{NAME => > 'Application', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'BinaryData', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > 'BinaryRetrieval', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true'}, {NAME => 'Files', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true'}, {NAME => 'Info', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true'}, {NAME => 'Network', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true'}, {NAME => 'Registry', VERSIONS => '1', COMPRESSION => 'NONE', TTL > => > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > 'true'}]}} > 09/10/21 11:04:49 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode > /hbase/root-region-server got 10.244.9.171:60020 > 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Found ROOT > at 10.244.9.171:60020 > 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached > location address: 10.245.82.160:60020, regioninfo: REGION => {NAME => > '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => > {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', > FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => > 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', > BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => > 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', > BLOCKCACHE => 'false'}]}} > 09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Creating scanner over > GS_Applications starting at key '' > 09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Advancing internal > scanner to startKey at '' > 09/10/21 11:04:50 DEBUG client.HConnectionManager$TableServers: Cache hit > for row <> in tableName GS_Applications: location server > 10.242.71.191:60020, > location region name GS_Applications,,1255020109210 > > > *Ananth T Sarathy > > > On Wed, Oct 21, 2009 at 11:00 AM, stack <[email protected]> wrote: > > > In both cases you are doing a full table scan? > > > > Try from shell with DEBUG enable. You'll see the regions being loaded. > > May > > help you narrow in on problem region or at least on problem regionserver. > > > > St.Ack > > > > On Wed, Oct 21, 2009 at 7:19 AM, Ananth T. Sarathy < > > [email protected]> wrote: > > > > > Anyone have any further thoughts on this? > > > Ananth T Sarathy > > > > > > > > > On Tue, Oct 20, 2009 at 6:37 PM, Ananth T. Sarathy < > > > [email protected]> wrote: > > > > > > > Well that's not the case. Every Row has that column. In fact the > > second > > > > snippet i sent is with a column with many less rows. (1k vs 25k) but > > > comes > > > > back pretty quickly. > > > > > > > > By forever, I mean i have watched my logs do nothing for a half hour > > > before > > > > giving up. > > > > > > > > > > > > Ananth T Sarathy > > > > > > > > > > > > > > > > On Tue, Oct 20, 2009 at 5:03 PM, Ryan Rawson <[email protected]> > > wrote: > > > > > > > >> If you are asking for a column that is very sparse and doesnt exist, > > > >> it will cause HBase to read through the entire region to find 100 > > > >> matching rows. This could take a while, you said 'forever', but > could > > > >> you quantify that? > > > >> > > > >> On Tue, Oct 20, 2009 at 1:58 PM, Jean-Daniel Cryans < > > > [email protected]> > > > >> wrote: > > > >> > Scanner pre-fetching is always faster, so something must be wrong > > with > > > >> > your region server. Check the logs, top, etc > > > >> > > > > >> > WRT to row size, it's pretty much a matter of how many bytes you > > have > > > >> > in each column and sum them up (plus some overhead with the keys). > > > >> > > > > >> > You want filters, check the filter package in the javadoc. > > > >> > > > > >> > J-D > > > >> > > > > >> > On Tue, Oct 20, 2009 at 1:52 PM, Ananth T. Sarathy > > > >> > <[email protected]> wrote: > > > >> >> Ok, but how come > > > >> >> when I run a similiar call (with less returned rows 1000 vs 25k > in > > > the > > > >> >> previous one) it runs through the iterator very quickly? (See > > Below) > > > >> >> > > > >> >> Also, how do I determine the row size? It's just text data, and > > > really > > > >> not > > > >> >> much. > > > >> >> > > > >> >> Finally, is there a way to query for rows that do not have a > > column? > > > >> (Ie all > > > >> >> rows without Files:path1) > > > >> >> > > > >> >> HBaseTableDataManagerImpl htdmni = new > > > >> HBaseTableDataManagerImpl( > > > >> >> "GS_Applications"); > > > >> >> > > > >> >> String[] columns = { "Files:path1" }; > > > >> >> log.info("Getting all Rows with Files"); > > > >> >> Scanner s = htdmni.getScannerForAllRows(columns); > > > >> >> log.info("Got all Rows with Files"); > > > >> >> > > > >> >> Iterator<RowResult> iter = s.iterator(); > > > >> >> out > > > >> >> > > > >> >> > > > >> > > > > > > .write("Application_Full_Name,Version,Application_installer_name,Operating > > > >> >> System, Application_platform > > > >> >> > > > >> > > > > > > ,Application_sub_category,md5Hash,Sha1Hash,Sha256Hash,filepath,fileName,modified,size,operation\n"); > > > >> >> out.write("<BR>"); > > > >> >> while (iter.hasNext()) > > > >> >> { > > > >> >> > > > >> >> Ananth T Sarathy > > > >> >> > > > >> >> > > > >> >> On Tue, Oct 20, 2009 at 4:44 PM, Jean-Daniel Cryans < > > > >> [email protected]>wrote: > > > >> >> > > > >> >>> If you have a very slow data source (S3), then it fetches 100 > row > > > >> >>> before coming back to your client with all of them and that can > > take > > > a > > > >> >>> lot of time. Also make sure that 100 of your rows can fit in a > > > region > > > >> >>> server's memory. How big is each row? > > > >> >>> > > > >> >>> J-D > > > >> >>> > > > >> >>> On Tue, Oct 20, 2009 at 1:32 PM, Ananth T. Sarathy > > > >> >>> <[email protected]> wrote: > > > >> >>> > I am running this code where > > > >> >>> > > > > >> >>> > getScannerForAllRows(columns) just does return > > > >> table.getScanner(columns); > > > >> >>> > > > > >> >>> > and the table has setScannerCaching(100); > > > >> >>> > > > > >> >>> > But it spins forever after getting the iterator. Why would > that > > > be? > > > >> How > > > >> >>> can > > > >> >>> > I speed it up? > > > >> >>> > > > > >> >>> > HBaseTableDataManagerImpl htdmni = new > > > >> HBaseTableDataManagerImpl( > > > >> >>> > "GS_Applications"); > > > >> >>> > > > > >> >>> > String[] columns = { "Files:Name" }; > > > >> >>> > log.info("Getting all Rows with Files"); > > > >> >>> > Scanner s = htdmni.getScannerForAllRows(columns); > > > >> >>> > log.info("Got all Rows with Files"); > > > >> >>> > log.info("Getting Iterator"); > > > >> >>> > > > > >> >>> > Iterator<RowResult> iter = s.iterator(); > > > >> >>> > log.info("Got Iterator"); > > > >> >>> > > > > >> >>> > while (iter.hasNext()) > > > >> >>> > { > > > >> >>> > log.info("Getting next Row"); > > > >> >>> > RowResult rr = iter.next(); > > > >> >>> > > > > >> >>> > > > > >> >>> > Ananth T Sarathy > > > >> >>> > > > > >> >>> > > > >> >> > > > >> > > > > >> > > > > > > > > > > > > > >
