As I said yesterday, you can check the logs, top, etc. One particular thing of interest would be a jstack of your region server's process while it's scanning and not returning.
J-D On Wed, Oct 21, 2009 at 7:19 AM, Ananth T. Sarathy <[email protected]> wrote: > Anyone have any further thoughts on this? > Ananth T Sarathy > > > On Tue, Oct 20, 2009 at 6:37 PM, Ananth T. Sarathy < > [email protected]> wrote: > >> Well that's not the case. Every Row has that column. In fact the second >> snippet i sent is with a column with many less rows. (1k vs 25k) but comes >> back pretty quickly. >> >> By forever, I mean i have watched my logs do nothing for a half hour before >> giving up. >> >> >> Ananth T Sarathy >> >> >> >> On Tue, Oct 20, 2009 at 5:03 PM, Ryan Rawson <[email protected]> wrote: >> >>> If you are asking for a column that is very sparse and doesnt exist, >>> it will cause HBase to read through the entire region to find 100 >>> matching rows. This could take a while, you said 'forever', but could >>> you quantify that? >>> >>> On Tue, Oct 20, 2009 at 1:58 PM, Jean-Daniel Cryans <[email protected]> >>> wrote: >>> > Scanner pre-fetching is always faster, so something must be wrong with >>> > your region server. Check the logs, top, etc >>> > >>> > WRT to row size, it's pretty much a matter of how many bytes you have >>> > in each column and sum them up (plus some overhead with the keys). >>> > >>> > You want filters, check the filter package in the javadoc. >>> > >>> > J-D >>> > >>> > On Tue, Oct 20, 2009 at 1:52 PM, Ananth T. Sarathy >>> > <[email protected]> wrote: >>> >> Ok, but how come >>> >> when I run a similiar call (with less returned rows 1000 vs 25k in the >>> >> previous one) it runs through the iterator very quickly? (See Below) >>> >> >>> >> Also, how do I determine the row size? It's just text data, and really >>> not >>> >> much. >>> >> >>> >> Finally, is there a way to query for rows that do not have a column? >>> (Ie all >>> >> rows without Files:path1) >>> >> >>> >> HBaseTableDataManagerImpl htdmni = new >>> HBaseTableDataManagerImpl( >>> >> "GS_Applications"); >>> >> >>> >> String[] columns = { "Files:path1" }; >>> >> log.info("Getting all Rows with Files"); >>> >> Scanner s = htdmni.getScannerForAllRows(columns); >>> >> log.info("Got all Rows with Files"); >>> >> >>> >> Iterator<RowResult> iter = s.iterator(); >>> >> out >>> >> >>> >> >>> .write("Application_Full_Name,Version,Application_installer_name,Operating >>> >> System, Application_platform >>> >> >>> ,Application_sub_category,md5Hash,Sha1Hash,Sha256Hash,filepath,fileName,modified,size,operation\n"); >>> >> out.write("<BR>"); >>> >> while (iter.hasNext()) >>> >> { >>> >> >>> >> Ananth T Sarathy >>> >> >>> >> >>> >> On Tue, Oct 20, 2009 at 4:44 PM, Jean-Daniel Cryans < >>> [email protected]>wrote: >>> >> >>> >>> If you have a very slow data source (S3), then it fetches 100 row >>> >>> before coming back to your client with all of them and that can take a >>> >>> lot of time. Also make sure that 100 of your rows can fit in a region >>> >>> server's memory. How big is each row? >>> >>> >>> >>> J-D >>> >>> >>> >>> On Tue, Oct 20, 2009 at 1:32 PM, Ananth T. Sarathy >>> >>> <[email protected]> wrote: >>> >>> > I am running this code where >>> >>> > >>> >>> > getScannerForAllRows(columns) just does return >>> table.getScanner(columns); >>> >>> > >>> >>> > and the table has setScannerCaching(100); >>> >>> > >>> >>> > But it spins forever after getting the iterator. Why would that be? >>> How >>> >>> can >>> >>> > I speed it up? >>> >>> > >>> >>> > HBaseTableDataManagerImpl htdmni = new >>> HBaseTableDataManagerImpl( >>> >>> > "GS_Applications"); >>> >>> > >>> >>> > String[] columns = { "Files:Name" }; >>> >>> > log.info("Getting all Rows with Files"); >>> >>> > Scanner s = htdmni.getScannerForAllRows(columns); >>> >>> > log.info("Got all Rows with Files"); >>> >>> > log.info("Getting Iterator"); >>> >>> > >>> >>> > Iterator<RowResult> iter = s.iterator(); >>> >>> > log.info("Got Iterator"); >>> >>> > >>> >>> > while (iter.hasNext()) >>> >>> > { >>> >>> > log.info("Getting next Row"); >>> >>> > RowResult rr = iter.next(); >>> >>> > >>> >>> > >>> >>> > Ananth T Sarathy >>> >>> > >>> >>> >>> >> >>> > >>> >> >> >
