If thats all you get, a thread dump on 10.242.71.191:60020 would help ($
kill -QUIT RS_PID).  Thread dump will be in the RS .out file.  Do a few.
Paste to pastebin.
Thanks Ananth,
St.Ack

On Wed, Oct 21, 2009 at 8:05 AM, Ananth T. Sarathy <
[email protected]> wrote:

> yeah,
>  I just don't understand why getScanner("Column:X") returns the iterator
> and
> process them yet getScanner("Column:Y") just spins and spins, yet  Column:Y
> is a much denser result.
>
> When I load from shell
>
> *Version: 0.20.0, r810752, Thu Sep  3 00:06:18 PDT 2009
> hbase(main):001:0> count 'GS_Applications'
> 09/10/21 11:04:48 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
> /hbase/root-region-server got 10.244.9.171:60020
> 09/10/21 11:04:48 DEBUG client.HConnectionManager$ClientZKWatcher: Got
> ZooKeeper event, state: SyncConnected, type: None, path: null
> 09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Found ROOT
> at 10.244.9.171:60020
> 09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Cached
> location address: 10.245.82.160:60020, regioninfo: REGION => {NAME =>
> '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE =>
> {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384',
> FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION =>
> 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION =>
> 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}]}}
> 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached
> location address: 10.242.71.191:60020, regioninfo: REGION => {NAME =>
> 'GS_Applications,,1255020109210', STARTKEY => '', ENDKEY => '', ENCODED =>
> 1732076772, TABLE => {{NAME => 'GS_Applications', FAMILIES => [{NAME =>
> 'Application', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'BinaryData', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'BinaryRetrieval', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}, {NAME => 'Files', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}, {NAME => 'Info', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}, {NAME => 'Network', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}, {NAME => 'Registry', VERSIONS => '1', COMPRESSION => 'NONE', TTL
> =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}]}}
> 09/10/21 11:04:49 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
> /hbase/root-region-server got 10.244.9.171:60020
> 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Found ROOT
> at 10.244.9.171:60020
> 09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached
> location address: 10.245.82.160:60020, regioninfo: REGION => {NAME =>
> '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE =>
> {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384',
> FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION =>
> 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION =>
> 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}]}}
> 09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Creating scanner over
> GS_Applications starting at key ''
> 09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Advancing internal
> scanner to startKey at ''
> 09/10/21 11:04:50 DEBUG client.HConnectionManager$TableServers: Cache hit
> for row <> in tableName GS_Applications: location server
> 10.242.71.191:60020,
> location region name GS_Applications,,1255020109210
>
>
> *Ananth T Sarathy
>
>
> On Wed, Oct 21, 2009 at 11:00 AM, stack <[email protected]> wrote:
>
> > In both cases you are doing a full table scan?
> >
> > Try from shell with DEBUG enable.  You'll see the regions being loaded.
> >  May
> > help you narrow in on problem region or at least on problem regionserver.
> >
> > St.Ack
> >
> > On Wed, Oct 21, 2009 at 7:19 AM, Ananth T. Sarathy <
> > [email protected]> wrote:
> >
> > > Anyone have any further thoughts on this?
> > > Ananth T Sarathy
> > >
> > >
> > > On Tue, Oct 20, 2009 at 6:37 PM, Ananth T. Sarathy <
> > > [email protected]> wrote:
> > >
> > > > Well that's not the case. Every Row has that column.  In fact the
> > second
> > > > snippet i sent  is with a column with many less rows. (1k vs 25k) but
> > > comes
> > > > back pretty quickly.
> > > >
> > > > By forever, I mean i have watched my logs do nothing for a half hour
> > > before
> > > > giving up.
> > > >
> > > >
> > > > Ananth T Sarathy
> > > >
> > > >
> > > >
> > > > On Tue, Oct 20, 2009 at 5:03 PM, Ryan Rawson <[email protected]>
> > wrote:
> > > >
> > > >> If you are asking for a column that is very sparse and doesnt exist,
> > > >> it will cause HBase to read through the entire region to find 100
> > > >> matching rows. This could take a while, you said 'forever', but
> could
> > > >> you quantify that?
> > > >>
> > > >> On Tue, Oct 20, 2009 at 1:58 PM, Jean-Daniel Cryans <
> > > [email protected]>
> > > >> wrote:
> > > >> > Scanner pre-fetching is always faster, so something must be wrong
> > with
> > > >> > your region server. Check the logs, top, etc
> > > >> >
> > > >> > WRT to row size, it's pretty much a matter of how many bytes you
> > have
> > > >> > in each column and sum them up (plus some overhead with the keys).
> > > >> >
> > > >> > You want filters, check the filter package in the javadoc.
> > > >> >
> > > >> > J-D
> > > >> >
> > > >> > On Tue, Oct 20, 2009 at 1:52 PM, Ananth T. Sarathy
> > > >> > <[email protected]> wrote:
> > > >> >> Ok, but how come
> > > >> >> when I run a similiar call (with less returned rows 1000 vs 25k
> in
> > > the
> > > >> >> previous one) it runs through the iterator very quickly?  (See
> > Below)
> > > >> >>
> > > >> >> Also, how do I determine the row size? It's just text data, and
> > > really
> > > >> not
> > > >> >> much.
> > > >> >>
> > > >> >> Finally, is there a way to query for rows that do not have a
> > column?
> > > >> (Ie all
> > > >> >> rows without Files:path1)
> > > >> >>
> > > >> >>        HBaseTableDataManagerImpl htdmni = new
> > > >> HBaseTableDataManagerImpl(
> > > >> >>                "GS_Applications");
> > > >> >>
> > > >> >>        String[] columns = { "Files:path1" };
> > > >> >>        log.info("Getting all Rows with Files");
> > > >> >>        Scanner s = htdmni.getScannerForAllRows(columns);
> > > >> >>        log.info("Got all Rows with Files");
> > > >> >>
> > > >> >>        Iterator<RowResult> iter = s.iterator();
> > > >> >>        out
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> .write("Application_Full_Name,Version,Application_installer_name,Operating
> > > >> >> System, Application_platform
> > > >> >>
> > > >>
> > >
> >
> ,Application_sub_category,md5Hash,Sha1Hash,Sha256Hash,filepath,fileName,modified,size,operation\n");
> > > >> >>        out.write("<BR>");
> > > >> >>        while (iter.hasNext())
> > > >> >>        {
> > > >> >>
> > > >> >> Ananth T Sarathy
> > > >> >>
> > > >> >>
> > > >> >> On Tue, Oct 20, 2009 at 4:44 PM, Jean-Daniel Cryans <
> > > >> [email protected]>wrote:
> > > >> >>
> > > >> >>> If you have a very slow data source (S3), then it fetches 100
> row
> > > >> >>> before coming back to your client with all of them and that can
> > take
> > > a
> > > >> >>> lot of time. Also make sure that 100 of your rows can fit in a
> > > region
> > > >> >>> server's memory. How big is each row?
> > > >> >>>
> > > >> >>> J-D
> > > >> >>>
> > > >> >>> On Tue, Oct 20, 2009 at 1:32 PM, Ananth T. Sarathy
> > > >> >>> <[email protected]> wrote:
> > > >> >>> > I am running this code where
> > > >> >>> >
> > > >> >>> > getScannerForAllRows(columns) just does return
> > > >> table.getScanner(columns);
> > > >> >>> >
> > > >> >>> > and the table   has setScannerCaching(100);
> > > >> >>> >
> > > >> >>> > But it spins forever after getting the iterator. Why would
> that
> > > be?
> > > >> How
> > > >> >>> can
> > > >> >>> > I speed it up?
> > > >> >>> >
> > > >> >>> >        HBaseTableDataManagerImpl htdmni = new
> > > >> HBaseTableDataManagerImpl(
> > > >> >>> >                "GS_Applications");
> > > >> >>> >
> > > >> >>> >        String[] columns = { "Files:Name" };
> > > >> >>> >        log.info("Getting all Rows with Files");
> > > >> >>> >        Scanner s = htdmni.getScannerForAllRows(columns);
> > > >> >>> >        log.info("Got all Rows with Files");
> > > >> >>> >        log.info("Getting Iterator");
> > > >> >>> >
> > > >> >>> >        Iterator<RowResult> iter = s.iterator();
> > > >> >>> >        log.info("Got Iterator");
> > > >> >>> >
> > > >> >>> >        while (iter.hasNext())
> > > >> >>> >        {
> > > >> >>> >            log.info("Getting next Row");
> > > >> >>> >            RowResult rr = iter.next();
> > > >> >>> >
> > > >> >>> >
> > > >> >>> > Ananth T Sarathy
> > > >> >>> >
> > > >> >>>
> > > >> >>
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to