Thank You.I'll try to implement all your advices. Thanks Again and Best Regards.
On Fri, Oct 3, 2008 at 12:27 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > If this is the case, then certainly what is hurting you is (repeating what > has been said before, but maybe it's clearer to you now): > > - Serialized round-trip RPC calls for each insert (will eventually be > handled with batched updates and/or parallelism in client, for now, you > would need to have multiple processes doing the writing... you will see a > major improvement if you have multiple writing processes) > > - Inserting to a single region. As described before, you're only hitting a > single server, so your writes are not at all being distributed. Lower your > region/filesize to get splits sooner. Also, keep your eye on: > https://issues.apache.org/jira/browse/HBASE-902 This feature is intended > for situations like this. > > > -----Original Message----- > From: Slava Gorelik [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 02, 2008 1:55 PM > To: [email protected] > Subject: Re: Hbase / Hadoop Tuning > > Hi.My webapp is trying to simulate the row by row operation, it means that > it's adding in the loop 100K Rows. > And my time measurement is started a line before loop and finished a line > after the loop, it means that no overhead of webapp. > But, sure, i'll take in deep, that i'm not spending the 1 or 2 ms for > some operation. > > Thank You and Best Regards. > > > > > On Thu, Oct 2, 2008 at 11:36 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > In this case, it would definitely hurt your performance. > > > > One question. Have you done more detailed timings to determine where > time > > is spent? With the overhead of your webapp, and it streaming insertions > > one > > row at a time, is it possible that a significant amount of time is being > > spent before or after the hbase commit (significant in this case could be > > 1-2 ms/row). > > > > JG > > > > -----Original Message----- > > From: Slava Gorelik [mailto:[EMAIL PROTECTED] > > Sent: Thursday, October 02, 2008 1:12 PM > > To: [email protected] > > Subject: Re: Hbase / Hadoop Tuning > > > > Thank You. > > According to doing write in MR jobs, the problem is that rows are coming > to > > webapp one by one and i can't accumulate them into > > one big batch update, it means i need to run MR job for each single row, > in > > this case will MR jobs help ? > > > > Best Regards. > > > > On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) < > > [EMAIL PROTECTED]> wrote: > > > > > Responses inline below. > > > > -----Original Message----- > > > > From: Slava Gorelik [mailto:[EMAIL PROTECTED] > > > > Sent: Thursday, October 02, 2008 12:39 PM > > > > To: [email protected] > > > > Subject: Re: Hbase / Hadoop Tuning > > > > > > > > Thank You Jim for a quick answer. > > > > 1) If i understand correct, using 2 clients should allow me improve > > > > the performance twice (more or less) ? > > > > > > I don't know if you will get 2x performance, but it will be greater > than > > > 1x. > > > > > > > 2) Currently, our webapp is HBase client using Htable - is that what > > you > > > > meant, when you said "(HBase, not web) clients" ? > > > > > > If multiple requests come into your webapp, and your webapp is > > > multithreaded, you will not see a performance increase. > > > > > > If your webapp runs a different process for each request, you will see > > > a performance increase because the RPC connection will not be shared > > > and consequently will not block on the giant lock. That is why I > > > recommended splitting up your job using Map/Reduce. > > > > > > > 3) 64MB for single region server is a minimum size or could be less ? > > > > > > It could be less, but that is the default block size for the Hadoop > DFS. > > > If you make it smaller, you might want to change the default block size > > > for Hadoop as well. > > > > > > > 4) When is planed to fix the RPC lock for concurrent operations > > > > in single client ? > > > > > > This change is targeted for somewhere in the next 6 months according > > > to the roadmap. > > > > > > > > > > Thank You Again and Best Regards. > > > > > > > > > > > > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > What you are storing is 140,000,000 bytes, so having multiple > > > > > region servers will not help you as a single region is only > > > > > served by a single region server. By default, regions split > > > > > when they reach 256MB. So until the region splits, all traffic > > > > > will go to a single region server. You might try reducing the > > > > > maximum file size to encourage region splitting by changing the > > > > > value of hbase.hregion.max.filesize to 64MB. > > > > > > > > > > Using a single client will also limit write performance. > > > > > Even if the client is multi-threaded, there is a big giant lock > > > > > in the RPC mechanism which prevents concurrent requests (This > > > > > is something we plan to fix in the future). > > > > > > > > > > Multiple clients do not block against one another the way multi- > > > > > threaded clients do currently. So another way to increase > > > > > write performance would be to run multiple (HBase, not web) > clients, > > > > > by either running multiple processes directly, or by utilizing > > > > > a Map/Reduce job to do the writes. > > > > > > > > > > --- > > > > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation) > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Slava Gorelik [mailto:[EMAIL PROTECTED] > > > > > > Sent: Thursday, October 02, 2008 12:07 PM > > > > > > To: [email protected] > > > > > > Subject: Re: Hbase / Hadoop Tuning > > > > > > > > > > > > Hi.Thank you for quick response. > > > > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise 10). > > > > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected with > > 1gb > > > > > > network interface. > > > > > > All machines in the same rec. On one machine (master) we are > > running > > > > > > Tomcat > > > > > > with one webapp > > > > > > that is adding 100000 rows. Nothing else is running. When no > webapp > > > > > > running > > > > > > the CPU load is less the 1%. > > > > > > > > > > > > We are using Hbase 0.18.0 and Hadoop 0.18.0. > > > > > > Hbase cluster is one master and 6 region servers. > > > > > > > > > > > > Row addition is done by BatchUpdate and commint into single > column > > > > > family. > > > > > > The data is simple bytes array (1400 bytes each row). > > > > > > > > > > > > > > > > > > Thank You and Best Regards. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > Tell us more Slava. HBase versions and how many regions you > have > > > in > > > > > > your > > > > > > > cluster? > > > > > > > > > > > > > > If small rows, your best boost will likely come when we support > > > > > batching > > > > > > of > > > > > > > updates: HBASE-748. > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > Slava Gorelik wrote: > > > > > > > > > > > > > >> Hi All. > > > > > > >> Our environment - 8 Datanodes (1 is also Namenode), > > > > > > >> 7 from them is also region servers and 1 is Master, default > > > > > replication > > > > > > - > > > > > > >> 3. > > > > > > >> We have application that heavy writes with relative small rows > - > > > > about > > > > > > >> 10Kb, > > > > > > >> current performance is 100000 rows in 580000 Milisec - 5.8 > > Milisec > > > > / > > > > > > row. > > > > > > >> Is there any way to improve this performance by some tuning / > > > > tweaking > > > > > > >> HBase > > > > > > >> or Hadoop ? > > > > > > >> > > > > > > >> Thank You and Best Regards. > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
