Sure, no problem. One addition: depending on the cardinality of your
priority column, you may want to salt your table to prevent hotspotting,
since you'll have a monotonically increasing date in the key. To do that,
just add SALT_BUCKETS=n on to your query, where n is the number of
machines in
Jesse, James, Lars,
after looking around a bit and in particular looking into Phoenix (which
I find very interesting), assuming that you want a secondary indexing on
HBASE without adding other infrastructure, there seems to be not a lot
of choice really: Either go with a region-level (and
Is there any data on how RLI (or in particular Phoenix) query throughput
correlates with the number of region servers assuming homogeneously
distributed data?
Phoenix is yet to add RLI. Now it is having global indexing only. Correct
James?
RLI impl from Huawei (HIndex) is having some numbers wrt
Here are some performance numbers with RLI.
No Region servers : 4
Data per region: 2 GB
Regions/RS| Total regions| Blocksize(kb) |No#rows matching values| Time
taken(sec)|
50 | 200| 64|199|102
50 | 200|8|199| 35
100|400 | 8| 350| 95
200| 800| 8| 353| 153
Without secondary index scan is
A proportional difference in time taken, wrt increase in # RSs (keeping
No#rows matching values constant), would be what is of utmost interest.
-Anoop-
On Fri, Jan 3, 2014 at 3:49 PM, rajeshbabu chintaguntla
rajeshbabu.chintagun...@huawei.com wrote:
Here are some performance numbers with
Interesting. This is exactly what I'm doing ;)
I'm using 3 tables to achieve this.
One table with the URL already crawled (80 millions), one URL with the URL
to crawle (2 billions) and one URL with the URLs been processed. I'm not
running any SQL requests against my dataset but I have MR jobs
What is generally of interest? RLI or global level. I know it is based on
usecase but is there a common need?
On Fri, Jan 3, 2014 at 4:31 PM, Anoop John anoop.hb...@gmail.com wrote:
A proportional difference in time taken, wrt increase in # RSs (keeping
No#rows matching values constant),
bq. One URL ...
I guess you mean one table ...
Cheers
On Jan 3, 2014, at 4:19 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote:
Interesting. This is exactly what I'm doing ;)
I'm using 3 tables to achieve this.
One table with the URL already crawled (80 millions), one URL with the
Are the regions scanned in parallel?
On Friday, January 3, 2014, rajeshbabu chintaguntla wrote:
Here are some performance numbers with RLI.
No Region servers : 4
Data per region: 2 GB
Regions/RS| Total regions| Blocksize(kb) |No#rows matching values| Time
taken(sec)|
50 | 200|
I think both approaches should be provided to HBase users.
These are new features that would both find proper usage scenarios.
Cheers
On Jan 3, 2014, at 5:48 AM, ramkrishna vasudevan
ramkrishna.s.vasude...@gmail.com wrote:
What is generally of interest? RLI or global level. I know it is
No. the regions scanned sequentially.
From: Asaf Mesika [asaf.mes...@gmail.com]
Sent: Friday, January 03, 2014 7:26 PM
To: user@hbase.apache.org
Subject: Re: secondary index feature
Are the regions scanned in parallel?
On Friday, January 3, 2014,
Yes, sorry ;) Thanks for the correction.
Should have been:
One table with the URL already crawled (80 millions), one table with the
URL
to crawle (2 billions) and one table with the URLs been processed. I'm not
running any SQL requests against my dataset but I have MR jobs doing many
different
See this thread:
http://search-hadoop.com/m/LviZD1WPToG/Snappy+libhadoopsubj=RE+Setting+up+Snappy+compression+in+Hadoop
On Jan 3, 2014, at 3:20 AM, 张玉雪 zhangyuxue123...@163.com wrote:
Hi:
When I used hadoop 2.2.0 and hbase 0.96.1.1 to using snappy
compression
I followed
Shameless plug ;)
http://www.spaggiari.org/index.php/hbase/how-to-install-snappy-with-1
Keep us posted.
2014/1/3 Ted Yu yuzhih...@gmail.com
See this thread:
http://search-hadoop.com/m/LviZD1WPToG/Snappy+libhadoopsubj=RE+Setting+up+Snappy+compression+in+Hadoop
On Jan 3, 2014, at 3:20 AM,
Clicking on a specific table name in the Master WebUI in 0.96.1.1 give me.
HTTP ERROR 404
Problem accessing /table.jsp. Reason:
/table.jsp
Only me? Or I should open a JIRA?
JM
Which tar ball did you expand - hadoop1 or hadoop2 ?
Cheers
On Fri, Jan 3, 2014 at 10:15 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Clicking on a specific table name in the Master WebUI in 0.96.1.1 give me.
HTTP ERROR 404
Problem accessing /table.jsp. Reason:
/table.jsp
Hadoop 2.
2014/1/3 Ted Yu yuzhih...@gmail.com
Which tar ball did you expand - hadoop1 or hadoop2 ?
Cheers
On Fri, Jan 3, 2014 at 10:15 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Clicking on a specific table name in the Master WebUI in 0.96.1.1 give
me.
HTTP ERROR 404
Couple of notes:
1. When updating to status you essentially add a new rowkey into HBase, I
would give it up all together. The essential requirement seems to point at
retrieving a list of urls in a certain order.
2. Wouldn't salting ruin the sort order required? Priority, date added?
On Friday,
On Fri, Jan 3, 2014 at 10:50 AM, Asaf Mesika asaf.mes...@gmail.com wrote:
Couple of notes:
1. When updating to status you essentially add a new rowkey into HBase, I
would give it up all together. The essential requirement seems to point at
retrieving a list of urls in a certain order.
Not
When scanning in order of an index and you use RLI, it seems, there is
no alternative but to involve all regions - and essentially this should
happen in parallel as otherwise you might not get what you wanted. Also,
for a single Get, it seems (as Lars pointed out in
Hi Henning,
Phoenix maintains a global index. It is essentially maintaining another
HBase table for you with a different row key (and a subset of your data
table columns that are covered). When an index is used by Phoenix, it is
*exactly* like querying a data table (that's what Phoenix does - it
Hi James,
this is a little embarassing... I even browsed through the code and read
it as implementing a region level index.
But now at least I get the restrictions mentioned for using the covered
indexes.
Thanks for clarifying. Guess I need to browse the code a little harder ;-)
Henning
The document is far from complete. It didn't mention the default hadoop
binary package is compiled without snappy support and you need to
compile it with snappy option yourself. Actually it didn't work with any
native libs on 64 bits OS as the libhadoop.so in the binary package is
only for 32
hi James,
phoenix seems great but it's now only a experimental project. I
want to use only hbase. could you tell me the difference of Phoenix
and hbase? If I use hbase only, how should I design the schema and
some extra things for my goal? thank you
On Sat, Jan 4, 2014 at 3:41 AM, James
Hi LiLi,
Phoenix isn't an experimental project. We're on our 2.2 release, and many
companies (including the company for which I'm employed, Salesforce.com)
use it in production today.
Thanks,
James
On Fri, Jan 3, 2014 at 11:39 PM, Li Li fancye...@gmail.com wrote:
hi James,
phoenix seems
so what's the relationship of Phoenix and HBase? something like hadoop and hive?
On Sat, Jan 4, 2014 at 3:43 PM, James Taylor jtay...@salesforce.com wrote:
Hi LiLi,
Phoenix isn't an experimental project. We're on our 2.2 release, and many
companies (including the company for which I'm
26 matches
Mail list logo