Re: Multiple column families - scan performance

2017-08-22 Thread Partha
Ram, Yes, each column family in the 4 c/f table has only one column qualifier. Will try addColumn(byte[] fam, byte[] qual) and test how that performs. Don't have another version of hbase to test across releases, but will see if I can manage this. In test case to scan only rowkeys, are you

Re: Multiple column families - scan performance

2017-08-22 Thread ramkrishna vasudevan
In HBase even if you say keyOnlyFilter there is a column family involved. In this case if the scan does not specify addfamily() then I think all the column families will be loaded. Regards Ram On Tue, Aug 22, 2017 at 6:47 PM, Partha wrote: > One other observation - even

Re: Multiple column families - scan performance

2017-08-22 Thread Partha
One other observation - even scanning 1MM rowkeys (using keyonlyfilter and firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is queried at all in this test.. On Aug 21, 2017 10:47 PM, "Partha" wrote: > hbase(main):001:0> describe 'TABLE1' > Table

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan
One more request would be to check the same test case with a new version of hbase - probably with the 1.3 or 1.2 latest. This is just to confirm if the problem that you see is across all releases. Because a simple test case reveals that with addFamily only the specified column is scanned and we

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan
Can you try one more thing - instead of addFamily try using addColumn(byte[] fam, byte[] qual). Since you are sure that there is only one qualifier. See how it works? Does it reduce the performance or increase the performance than the addFamily() and how is it related to the 1 CF case. Also just

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
hbase(main):001:0> describe 'TABLE1' Table TABLE1 is ENABLED TABLE1 COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS =>

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
final Scan scan = new Scan(startInclusive, endExclusive) .addFamily(stage.getBytes()) .setCaching(DEFAULT_BATCH_SIZE) .setCacheBlocks(false); Here is the scan test code. This will return ~1MM rows from both tables, while limiting scan to a single column

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
addFamily only. There is only 1 column/qualifier per column family On Aug 21, 2017 2:05 PM, "Anoop John" wrote: In ur test are u using Scan#addColumn(byte [] family, byte [] qualifier) or it is addFamily(byte [] family) only? On Mon, Aug 21, 2017 at 10:02 PM, Partha

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
Will send across table statement and the test code. Pls let me know if you find anything from your test given the inputs so far. Note that column family has only 1 qualifier with json payload value of size 15KB. The column families use fastdiff encoding and gzip compression. Added user@ to this

Re: Multiple column families - scan performance

2017-08-17 Thread ramkrishna vasudevan
o scan the single > column > > family from the 1st table. In both cases, the scanner is bounded by a > start > > and stop key to scan 1MM rows. Performance did not change much even after > > running a major compaction on both tables. > > > > Though HBase doc and other

Re: Multiple column families - scan performance

2017-08-17 Thread Anoop John
doc and other tech forums recommend not using more than 1 > column family per table, nothing I have read so far suggests scan > performance will linearly degrade based on number of column families. Has > anyone else experienced this, and is there a simple explanation for this? > > To

Multiple column families - scan performance

2017-08-17 Thread Partha
did not change much even after running a major compaction on both tables. Though HBase doc and other tech forums recommend not using more than 1 column family per table, nothing I have read so far suggests scan performance will linearly degrade based on number of column families. Has anyone else

Multiple column families - scan performance

2017-08-14 Thread ps0618
1MM rows. Performance did not change much even after running a major compaction on both tables. Though HBase doc and other tech forums recommend not using more than 1 column family per table, nothing I have read so far suggests scan performance will linearly degrade based on number of column

Multiple column families - scan performance

2017-08-14 Thread Partha Sarathy
. Performance did not change much even after running a major compaction on both tables. Though HBase doc and other tech forums recommend not using more than 1 column family per table, nothing I have read so far suggests scan performance will linearly degrade based on number of column families. Has

Re: scan performance

2017-01-19 Thread Rajeshkumar J
Same one but now I think we have found the cause. we have one column qualifier and five columns in every table. We will add singlecolumnvaluefilter to the scan based on the input parameters. All the time scan is successful except when we add that specific column. We have tested other four columns

Re: scan performance

2017-01-19 Thread Yu Li
So the answer in the previous mail thread didn't resolve your problem, or this is a new one? If a new one, mind talk about more details? Thanks. Best Regards,

scan performance

2017-01-19 Thread Rajeshkumar J
I am using SingleColumnValueFilter for filtering based on some values. Based on this I am getting lease expired exception during scan. So is there any way to solve this?

Re: Scan Performance Decreases Over Time

2016-10-10 Thread Ted Yu
perfoamance decreases over time. > Hbase connections are kept in a data access service (in tomcat), and there > are table scan operations. The scan performance for each scan batch(~10 > parallel scan) increases as below: > dayavg. cost(ms) > 156.213115 > 243.69705

Scan Performance Decreases Over Time

2016-10-10 Thread 陆巍
Hi All, I met with a problem where the scan perfoamance decreases over time. Hbase connections are kept in a data access service (in tomcat), and there are table scan operations. The scan performance for each scan batch(~10 parallel scan) increases as below: dayavg. cost(ms) 156.213115

Re: Multirange Scan performance

2016-08-30 Thread Ted Yu
range scans) handled (exmpl: Accumulo API has > setRanges function to set ranges for InputFormat)? > > > > -- > View this message in context: http://apache-hbase.679495.n3. > nabble.com/Multirange-Scan-performance-tp4082215.html > Sent from the HBase User mailing list archive at Nabble.com. >

Multirange Scan performance

2016-08-30 Thread daunnc
) handled (exmpl: Accumulo API has setRanges function to set ranges for InputFormat)? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Multirange-Scan-performance-tp4082215.html Sent from the HBase User mailing list archive at Nabble.com.

Re: Inconsistent scan performance

2016-03-29 Thread james.johansville
w, or does it vary? > > Lastly, how big are the rows? > > Thanks. > > -- Lars > > From: James Johansville <james.johansvi...@gmail.com> > To: user@hbase.apache.org > Sent: Friday, March 25, 2016 12:23 PM > Subject: Re: Inconsistent scan performance

Re: Inconsistent scan performance

2016-03-27 Thread larsh
Sent: Friday, March 25, 2016 12:23 PM Subject: Re: Inconsistent scan performance Hello all, I have 13 RegionServers and presplit into 13 regions (which motivated my comment that I aligned my queries with the regionservers, which obviously isn't accurate). I have been testing using a multiple

Re: Inconsistent scan performance

2016-03-25 Thread Stack
is already in cache? > > > >> (Any previous scan or by cache on write) And there are no > concurrent > > > >> writes any way right? This much difference in time ! One > > > >> possibility is blocks avail or not avail in cache.. > > >

Re: Inconsistent scan performance

2016-03-25 Thread James Johansville
t;> > > >>>> Hello all, > > >>>> > > >>>> So, I wrote a Java application for HBase that does a partitioned > > >> full-table > > >>>> scan according to a set number of partitions. For example, if there > > are > > >> 20 &

Re: Inconsistent scan performance

2016-03-25 Thread Stack
> >> cover > >>>> an equal slice of the row identifier range. > >>>> > >>>> The rows are uniformly distributed throughout the RegionServers. > >>> > >>> > >>> How many RegionServers? How many Regions? Are Regio

Re: Inconsistent scan performance

2016-03-25 Thread Nicolas Liochon
en out? > > > > The disparity seems really wide. > > > > St.Ack > > > > > > > > > >> I > >> confirmed this through the hbase shell. I have only one column family, > and > >> each row has the same number of column qualifiers. &

Re: Inconsistent scan performance

2016-03-24 Thread Anoop John
gt; St.Ack > > > > >> I >> confirmed this through the hbase shell. I have only one column family, and >> each row has the same number of column qualifiers. >> >> My problem is that the individual scan performance is wildly inconsistent >> even though they

Re: Inconsistent scan performance

2016-03-24 Thread Stack
all partitions on one machine and then run your client, do the timings even out? The disparity seems really wide. St.Ack > I > confirmed this through the hbase shell. I have only one column family, and > each row has the same number of column qualifiers. > > My problem is that the indiv

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu
are launched that > cover > > > an equal slice of the row identifier range. > > > > > > The rows are uniformly distributed throughout the RegionServers. I > > > confirmed this through the hbase shell. I have only one column family, > >

Re: Inconsistent scan performance

2016-03-24 Thread James Johansville
te full scans are launched that cover > > an equal slice of the row identifier range. > > > > The rows are uniformly distributed throughout the RegionServers. I > > confirmed this through the hbase shell. I have only one column family, > and > > each row has the same number of

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu
t; > The rows are uniformly distributed throughout the RegionServers. I > confirmed this through the hbase shell. I have only one column family, and > each row has the same number of column qualifiers. > > My problem is that the individual scan performance is wildly inconsistent > even tho

Inconsistent scan performance

2016-03-24 Thread James Johansville
are uniformly distributed throughout the RegionServers. I confirmed this through the hbase shell. I have only one column family, and each row has the same number of column qualifiers. My problem is that the individual scan performance is wildly inconsistent even though they fetch approximately a similar

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-12 Thread Demai Ni
Andrey, thanks. You are right that I am using Thrift v1. I was following example under : hbase-examples/src/main/cpp/DemoClient.cpp. It looks like pretty old, and actually its scan example: scanner = client.scannerOpenWithStop(t, 00020, 00040, columnNames, dummyAttributes); doesn't work. I

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-09 Thread Demai Ni
Andrey and all, thanks for the input. Andrey, if possible, do you mind share your code segment so I can follow the setting on your side? I have exactly the same thought when face the result first time. I was expecting a little bit performance issue (10~20%) when using Thrift(C++), and not as

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-09 Thread Andrey Stepachev
Sorry Demai, I have no access to that code currently. But what you described seems that you use thrift v1. I'd recommend to use thrift2. Also it is a good idea to check thrift server configuration: 1. blocking/nonblocking/hsha, and framed or not 2. size of thread pool On Mon, Mar 9, 2015 at

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Mike Axiak
If you're going the JNI route, the best bet is to embed a VM in your C project. You use java -s -p to create the required header files and compile linking against the java library. This article talks about how to talk from C to Java:

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Andrey Stepachev
Hi Demai. Thats seems odd for me, in my tests I got very similar performance. I'd like to suggest to check that scans have identical parameters (cache size in particular). That can bring very different performance in you case. Thanks. On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak m...@axiak.net

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Michael Segel
JNI example? I don’t have one… my client’s own the code so I can’t take it with me and share. (The joys of being a consultant means you can’t take it with you and you need to make sure you don’t xfer IP accidentally. ) Maybe in one of the HBase books? Or just google for a JNI example on

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Nick Dimiduk
You can try the REST gateway, though it has the same basic architecture as the thrift gateway. May be the details work out in your favor over rest. On Fri, Mar 6, 2015 at 11:31 PM, nidmgg nid...@gmail.com wrote: Stack, Thanks for the quick response. Well, the extra layer really kill the

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Michael Segel
Or you could try a java connection wrapped by JNI so you can call it from your C++ app. On Mar 7, 2015, at 1:00 PM, Nick Dimiduk ndimi...@gmail.com wrote: You can try the REST gateway, though it has the same basic architecture as the thrift gateway. May be the details work out in your

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Mike Axiak
What if you install the thrift server locally on every C++ client machine? I'd imagine performance should be similar to native java performance at that point. -Mike On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel michael_se...@hotmail.com wrote: Or you could try a java connection wrapped by JNI

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Demai Ni
Nick, thanks. I will give REST a try. However, if it use the same design, the result probably will be the same. Michael, I was thinking about the same thing through JNI. Is there an example I can follow? Mike (Axiak), I run the C++ client on the same linux machine as the hbase and thrift. The

significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-06 Thread Demai Ni
hi, guys, I am trying to get a rough idea about the performance comparison between c++ and java client when access HBase table, and is surprised to find out that Thrift (c++) is 4X slower The performance result is: C++: real*16m11.313s*; user5m3.642s; sys2m21.388s Java: real

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-06 Thread nidmgg
Stack, Thanks for the quick response. Well, the extra layer really kill the Performance. The 'hop' is so expensive Is there another C/C++ api to try out? I saw there is a jira Hbase-1015, but was inactive for a while. Demai Stack st...@duboce.net wrote: Is it because of the 'hop'?  Java

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-06 Thread Stack
Is it because of the 'hop'? Java goes against RS. The thrift C++ goes to a thriftserver which hosts a java client and then it goes to the RS? St.Ack On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni nid...@gmail.com wrote: hi, guys, I am trying to get a rough idea about the performance comparison

Re: Scan performance

2013-10-19 Thread Jean-Marc Spaggiari
to test scan performance with 0.94.9 with around 6000 rows X 40 columns and FuzzyRowFilter gave us 2-4 times better performance. I was able to test this offline without any problems. However, once I turned it on in our development cluster, we noticed that with some row keys that should

Re: Client Get vs Coprocessor scan performance

2013-08-19 Thread Kiru Pakkirisamy
kirupakkiris...@yahoo.com Cc: user@hbase.apache.org user@hbase.apache.org Sent: Sunday, August 18, 2013 5:34 PM Subject: Re: Client Get vs Coprocessor scan performance Kiru, What's your column family name? Just to confirm, the column qualifier of your key value is C_10345 and this stores a value

Re: Client Get vs Coprocessor scan performance

2013-08-19 Thread James Taylor
To: Kiru Pakkirisamy kirupakkiris...@yahoo.com Cc: user@hbase.apache.org user@hbase.apache.org Sent: Sunday, August 18, 2013 5:34 PM Subject: Re: Client Get vs Coprocessor scan performance Kiru, What's your column family name? Just to confirm, the column qualifier of your key value is C_10345

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Ted Yu
From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Saturday, August 17, 2013 4:19 PM Subject: Re: Client Get vs Coprocessor scan performance HBASE-6870 targeted whole table scanning for each coprocessorService call which exhibited itself through

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Pakkirisamy | webcloudtech.wordpress.com From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Saturday, August 17, 2013 4:19 PM Subject: Re: Client Get vs Coprocessor scan performance HBASE-6870 targeted whole table scanning

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
Coprocessor scan performance bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on the whole length of the key) In this case the Get's are very selective. The number of rows FuzzyRowFilter was evaluated against would be much higher. It would be nice if you remember the time each

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
Subject: Re: Client Get vs Coprocessor scan performance Would be interesting to compare against Phoenix's Skip Scan (http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html) which does a scan through a coprocessor and is more than 2x faster than multi Get (plus handles multi

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Pakkirisamy kirupakkiris...@yahoo.com Sent: Sunday, August 18, 2013 11:44 AM Subject: Re: Client Get vs Coprocessor scan performance Would be interesting to compare against Phoenix's Skip Scan ( http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html ) which does

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
...@yahoo.com Sent: Sunday, August 18, 2013 2:07 PM Subject: Re: Client Get vs Coprocessor scan performance Kiru, If you're able to post the key values, row key structure, and data types you're using, I can post the Phoenix code to query against it. You're doing some kind of aggregation too, right

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
-- *From:* James Taylor jtay...@salesforce.com *To:* user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com *Sent:* Sunday, August 18, 2013 2:07 PM *Subject:* Re: Client Get vs Coprocessor scan performance Kiru, If you're able to post the key values, row key

Re: Client Get vs Coprocessor scan performance

2013-08-17 Thread Asaf Mesika
Pakkirisamy | webcloudtech.wordpress.com From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 8, 2013 8:40 PM Subject: Re: Client Get vs Coprocessor scan performance Can you give us

Re: Client Get vs Coprocessor scan performance

2013-08-17 Thread Ted Yu
@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 8, 2013 8:40 PM Subject: Re: Client Get vs Coprocessor scan performance Can you give us a bit more information ? How do you deliver the 55 rowkeys to your endpoint ? How many regions do you have

Re: Client Get vs Coprocessor scan performance

2013-08-17 Thread Kiru Pakkirisamy
Sent: Saturday, August 17, 2013 4:19 PM Subject: Re: Client Get vs Coprocessor scan performance HBASE-6870 targeted whole table scanning for each coprocessorService call which exhibited itself through: HTable#coprocessorService - getStartKeysInRange - getStartEndKeys - getRegionLocations

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread James Taylor
: Client Get vs Coprocessor scan performance I think this fixes my issues. On our dev cluster what used to take 1200 msec is now in the 700-800 msec region. Thanks again. I will be soon deploying this to our Performance cluster where our query is at 15 secs range. Regards, - kiru Kiru

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread Kiru Pakkirisamy
kirupakkiris...@yahoo.com Sent: Monday, August 12, 2013 9:41 AM Subject: Re: Client Get vs Coprocessor scan performance Hey Kiru, Another option for you may be to use Phoenix (https://github.com/forcedotcom/phoenix). In particular, our skip scan may be what you're looking for:  http://phoenix

Re: Client Get vs Coprocessor scan performance

2013-08-11 Thread Kiru Pakkirisamy
Pakkirisamy | webcloudtech.wordpress.com From: Kiru Pakkirisamy kirupakkiris...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, August 9, 2013 1:04 PM Subject: Re: Client Get vs Coprocessor scan performance I think this fixes my issues. On our

Re: Client Get vs Coprocessor scan performance

2013-08-09 Thread Kiru Pakkirisamy
, August 8, 2013 10:44 PM Subject: Re: Client Get vs Coprocessor scan performance I think you need HBASE-6870 which went into 0.94.8 Upgrading should boost coprocessor performance. Cheers On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Ted, Here is the method

Re: Client Get vs Coprocessor scan performance

2013-08-09 Thread Wukang Lin
From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 8, 2013 8:40 PM Subject: Re: Client Get vs Coprocessor scan performance Can you give us a bit more information ? How do you deliver the 55

Re: Client Get vs Coprocessor scan performance

2013-08-09 Thread Kiru Pakkirisamy
...@gmail.com To: user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 8, 2013 11:00 PM Subject: Re: Client Get vs Coprocessor scan performance Hi Kiru,     Sorry for my poor english.     If you perform a batch GET using HTable.get(ListGet), it not a really single

Re: Client Get vs Coprocessor scan performance

2013-08-09 Thread Kiru Pakkirisamy
From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 8, 2013 10:44 PM Subject: Re: Client Get vs Coprocessor scan performance I think you need HBASE-6870 which went

Re: Scan performance

2013-08-08 Thread Viral Bajaria
in production. Thanks, Viral On Tue, Jul 16, 2013 at 8:07 PM, Tony Dean tony.d...@sas.com wrote: I was able to test scan performance with 0.94.9 with around 6000 rows X 40 columns and FuzzyRowFilter gave us 2-4 times better performance. I was able to test this offline without any problems. However, once

Client Get vs Coprocessor scan performance

2013-08-08 Thread Kiru Pakkirisamy
Hi, I am finding an odd behavior with the Coprocessor performance lagging a client side Get. I have a table with 50 rows. Each have variable # of columns in one column family (in this case about 60 columns in total are processed) When I try to get specific 55 rows, the client side

Re: Client Get vs Coprocessor scan performance

2013-08-08 Thread Ted Yu
Can you give us a bit more information ? How do you deliver the 55 rowkeys to your endpoint ? How many regions do you have for this table ? What HBase version are you using ? Thanks On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.comwrote: Hi, I am finding an odd

Re: Client Get vs Coprocessor scan performance

2013-08-08 Thread Kiru Pakkirisamy
Get vs Coprocessor scan performance Can you give us a bit more information ? How do you deliver the 55 rowkeys to your endpoint ? How many regions do you have for this table ? What HBase version are you using ? Thanks On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy kirupakkiris

Re: Client Get vs Coprocessor scan performance

2013-08-08 Thread Ted Yu
| webcloudtech.wordpress.com From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 8, 2013 8:40 PM Subject: Re: Client Get vs Coprocessor scan performance Can you give us a bit more information

RE: Scan performance

2013-07-16 Thread Tony Dean
I was able to test scan performance with 0.94.9 with around 6000 rows X 40 columns and FuzzyRowFilter gave us 2-4 times better performance. I was able to test this offline without any problems. However, once I turned it on in our development cluster, we noticed that with some row keys

RE: Scan performance

2013-07-16 Thread Tony Dean
, July 16, 2013 9:29 PM To: user@hbase.apache.org Subject: RE: Scan performance I was able to test scan performance with 0.94.9 with around 6000 rows X 40 columns and FuzzyRowFilter gave us 2-4 times better performance. I was able to test this offline without any problems. However, once I turned

RE: Scan performance

2013-07-03 Thread Tony Dean
Thanks Ted. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, July 02, 2013 6:11 PM To: user@hbase.apache.org Subject: Re: Scan performance Tony: Take a look at http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes

RE: Scan performance

2013-07-02 Thread Tony Dean
The following information is what I discovered from Scan performance testing. Setup --- row key format: positiion1,position2,position3 where position1 is a fixed literal, and position2 and position3 are variable data. I have created data with 6000 rows with ~40 columns in each row

Re: Scan performance

2013-07-02 Thread Ted Yu
Tony: Take a look at http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/ Cheers On Tue, Jul 2, 2013 at 2:31 PM, Tony Dean tony.d...@sas.com wrote: The following information is what I discovered from Scan performance testing. Setup

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread lars hofhansl
Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Enis Söztutar
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl la...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Bryan Keller
...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Bryan Keller
brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Ted Yu
...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm

Re: Poor HBase map-reduce scan performance

2013-06-28 Thread lars hofhansl
: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase

Re: Poor HBase map-reduce scan performance

2013-06-25 Thread Bryan Keller
To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight

RE: Scan performance

2013-06-24 Thread Tony Dean
Hi James, I do plan on looking more closely at Phoenix for SQL access to HBase. Thanks. -Original Message- From: James Taylor [mailto:jtay...@salesforce.com] Sent: Saturday, June 22, 2013 1:18 PM To: user@hbase.apache.org Subject: Re: Scan performance Hi Tony, Have you had a look

RE: Scan performance

2013-06-24 Thread Tony Dean
, 2013 9:24 AM To: user@hbase.apache.org Subject: Re: Scan performance essential column families help when you filter on one column but want to return *other* columns for the rows that matched the column. Check out HBASE-5416. -- Lars From: Vladimir Rodionov

Re: Scan performance

2013-06-24 Thread lars hofhansl
tony.d...@sas.com To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Monday, June 24, 2013 1:48 PM Subject: RE: Scan performance Lars, I'm waiting for some time to exchange out hbase jars in cluster (that support FuzzyRow filter) in order to try out

RE: Scan performance

2013-06-24 Thread Tony Dean
@hbase.apache.org Subject: Re: Scan performance RowFilter can help. It depends on the setup. RowFilter skip all column of the row when the row key does not match. That will help with IO *if* your rows are larger than the HFile block size (64k by default). Otherwise it still needs to touch each block

Re: Scan performance

2013-06-22 Thread Anoop John
again. -Tony -Original Message- From: Vladimir Rodionov [mailto:vrodio...@carrieriq.com] Sent: Friday, June 21, 2013 8:00 PM To: user@hbase.apache.org; lars hofhansl Subject: RE: Scan performance Lars, I thought that column family is the locality group and placement columns which

Re: Scan performance

2013-06-22 Thread lars hofhansl
@hbase.apache.org; lars hofhansl la...@apache.org Sent: Friday, June 21, 2013 5:00 PM Subject: RE: Scan performance Lars, I thought that column family is the locality group and placement columns which are frequently accessed together into the same column family (locality group) is the obvious performance

Re: Scan performance

2013-06-22 Thread lars hofhansl
...@gmail.com To: user@hbase.apache.org Sent: Friday, June 21, 2013 11:58 PM Subject: Re: Scan performance Have a look at FuzzyRowFilter -Anoop- On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean tony.d...@sas.com wrote: I understand more, but have additional questions about the internals... So

Re: Scan performance

2013-06-22 Thread James Taylor
to a specific key value. The latter is what FuzzyRowFilter does. -- Lars From: Anoop John anoop.hb...@gmail.com To: user@hbase.apache.org Sent: Friday, June 21, 2013 11:58 PM Subject: Re: Scan performance Have a look at FuzzyRowFilter -Anoop

Scan performance

2013-06-21 Thread Tony Dean
Hi, I hope that you can shed some light on these 2 scenarios below. I have 2 small tables of 6000 rows. Table 1 has only 1 column in each of its rows. Table 2 has 40 columns in each of its rows. Other than that the two tables are identical. In both tables there is only 1 row that contains a

RE: Scan performance

2013-06-21 Thread Vladimir Rodionov
family and all other column in another column family. In that case your scan performance should be close identical. -- Lars From: Tony Dean tony.d...@sas.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, June 21, 2013 2:08 PM Subject: Scan

RE: Scan performance

2013-06-21 Thread Tony Dean
, 2013 8:00 PM To: user@hbase.apache.org; lars hofhansl Subject: RE: Scan performance Lars, I thought that column family is the locality group and placement columns which are frequently accessed together into the same column family (locality group) is the obvious performance improvement tip. What

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
the poor scan performance against a table.

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table.

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Ted Yu
and bypass the regionservers. This could potentially give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table.

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
address the poor scan performance against a table.

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
really address the poor scan performance against a table.

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
to scan the HDFS files directly and bypass the regionservers. This could potentially give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table.

  1   2   >