55 rows/sec? What's your row size? What % of your reads are hitting the cache and what % are going to the disk?
One of the things you can do to improve the random read performance is reduce the HFile block size. -ak On Wed, Oct 26, 2011 at 12:51 PM, Vladimir Rodionov <[email protected] > wrote: > > We have a reporting tool which runs queries against Oracle DB, collects > fact ids and then > queries HBase for these facts (one-by-one). This is single thread, simple > Get op > > It is slow, of course. 5 hours to retrieve 1M facts from HBase storage. > Approx 55 rows per sec > > I know I can use batch get to increase the speed but my question is what > else we can do to make our ops team happier? > > How to optimize random I/O performance in HBase (hi, Facebook we have the > same problem as you guys :) > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Gary Helmling [[email protected]] > Sent: Wednesday, October 26, 2011 12:34 PM > To: [email protected] > Subject: Re: proposal for naming convention of patches for TRUNK > > Also should be possible to use the file command? > > $ file HBASE-4680.txt > HBASE-4680.txt: diff output text > > > > On Wed, Oct 26, 2011 at 12:32 PM, Ted Yu <[email protected]> wrote: > > Looping in Giri. > > > > Giri: > > Do you think you have enough heuristics for the filter ? > > > > Thanks > > > > On Wed, Oct 26, 2011 at 12:29 PM, Todd Lipcon <[email protected]> wrote: > > > >> Should be pretty easy to use grep to determine if a file is a patch or > >> not. Patch files have lines starting with "---" and "+++". > >> > >> > >> On Wed, Oct 26, 2011 at 11:58 AM, Ted Yu <[email protected]> wrote: > >> > #1 is reasonable. > >> > > >> > For #2, the following would be included for test validation: > >> > > >> > how-to-reproduce-the-problem.txt > >> > script-I-used.txt > >> > > >> > Just a few examples. > >> > > >> > On Wed, Oct 26, 2011 at 11:52 AM, Jonathan Hsieh <[email protected]> > >> wrote: > >> > > >> >> Suggestion: > >> >> > >> >> 1) Don't run check if the apache inclusion flag isn't checked? > >> >> 2) Require extension to be .diff, .patch, or .txt? > >> >> > >> >> Jon. > >> >> > >> >> On Wed, Oct 26, 2011 at 11:37 AM, Ted Yu <[email protected]> > wrote: > >> >> > >> >> > How do we exclude non-patch attachments, such as > >> >> > EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo< > >> >> > > >> >> > >> > http://issues.apache.org/jira/secure/attachment/12500832/EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo > >> >> > >? > >> >> > > >> >> > Thanks > >> >> > > >> >> > On Wed, Oct 26, 2011 at 11:32 AM, Todd Lipcon <[email protected]> > >> wrote: > >> >> > > >> >> > > I prefer to default to trunk, and require a -0.90 or -0.92 to > >> >> > > delineate a different branch. Most patches should be against > trunk, > >> so > >> >> > > let's optimize for the common case. > >> >> > > > >> >> > > -Todd > >> >> > > > >> >> > > On Wed, Oct 26, 2011 at 11:04 AM, Ted Yu <[email protected]> > >> wrote: > >> >> > > > Hi, > >> >> > > > I am working with Giri on a filter that should help us avoid > the > >> >> > > following > >> >> > > > (see HBASE-4377): > >> >> > > > > >> >> > > > -1 overall. Here are the results of testing the latest > attachment > >> >> > > > > >> >> > > > >> >> > > >> >> > >> > http://issues.apache.org/jira/secure/attachment/12500832/EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo > >> >> > > > against trunk revision . > >> >> > > > > >> >> > > > I am proposing the following convention: TRUNK patch filename > >> should > >> >> > > contain > >> >> > > > the word 'trunk' in a prominent manner - surrounded by either > dash > >> or > >> >> > > dot. > >> >> > > > Valid examples are: > >> >> > > > > >> >> > > > < > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12500830/hbase-4377.trunk.v4.txt > >> >> > > > > >> >> > > > hbase-4377.trunk.v4.txt< > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12500830/hbase-4377.trunk.v4.txt > >> >> > > > > >> >> > > > < > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12497503/hbase-4377-trunk.v2.patch > >> >> > > > > >> >> > > > hbase-4377-trunk.v2.patch< > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12497503/hbase-4377-trunk.v2.patch > >> >> > > > > >> >> > > > < > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12499805/0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> > 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch< > >> >> > > > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/attachment/12499805/0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch > >> >> > > > > >> >> > > > > >> >> > > > This would allow Giri to write filter that correctly uploads > patch > >> >> for > >> >> > > TRUNK > >> >> > > > to Jenkins for test build. > >> >> > > > > >> >> > > > Please provide your comments. > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > -- > >> >> > > Todd Lipcon > >> >> > > Software Engineer, Cloudera > >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> // Jonathan Hsieh (shay) > >> >> // Software Engineer, Cloudera > >> >> // [email protected] > >> >> > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > >
