Announcing Phoenix: A SQL layer over HBase

2013-01-30 Thread James Taylor
roadmap: https://github.com/forcedotcom/phoenix/wiki#wiki-roadmap We welcome feedback and contributions from the community to Phoenix and look forward to working together. Regards, James Taylor @JamesPlusPlus

Re: Announcing Phoenix: A SQL layer over HBase

2013-02-01 Thread James Taylor
...@mapbased.comwrote: Great tool,I will try it later. thanks for sharing! 2013/1/31 Devaraj Das d...@hortonworks.com Congratulations, James. We will surely benefit from this tool. On Wed, Jan 30, 2013 at 1:04 PM, James Taylor jtay...@salesforce.com wrote: We are pleased to announce the immediate

Re: Parallel scan in HBase

2013-02-01 Thread James Taylor
If you run a SQL query that does aggregation (i.e. uses a built-in aggregation function like COUNT or does a GROUP BY), Phoenix will orchestrate the running of a set of queries in parallel, segmented along your row key (driven by the start/stop key plus region boundaries). We take advantage of

Re: How would you model this in Hbase?

2013-02-06 Thread James Taylor
Another approach would be to use Phoenix (http://github.com/forcedotcom/phoenix). You can model your schema as you would in the relational world, but you get the horizontal scalability of HBase. James On 02/06/2013 01:49 PM, Michael Segel wrote: Overloading the time stamp aka the

independent scans to same region processed serially

2013-02-08 Thread James Taylor
Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single

Re: independent scans to same region processed serially

2013-02-08 Thread James Taylor
(https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4. I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) -- Lars From: James Taylor jtay

Re: independent scans to same region processed serially

2013-02-09 Thread James Taylor
- Original Message - From: James Taylor jtay...@salesforce.com To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Friday, February 8, 2013 9:52 PM Subject: Re: independent scans to same region processed serially All data is the blockcache

Re: independent scans to same region processed serially

2013-02-10 Thread James Taylor
Filed https://issues.apache.org/jira/browse/HBASE-7805 Test case attached It occurs only if the table has a region observer coprocessor. James On 02/09/2013 11:04 AM, lars hofhansl wrote: If I execute in parallel multiple scans to different parts of the same region, they appear to be

jarFilePath for HTableDescriptor.addCoprocessor() with 0.94.2 vs 0.94.4

2013-02-11 Thread James Taylor
In 0.94.2, if the coprocessor class was on the HBase classpath, then the jarFilePath argument to HTableDescriptor.addCoprocessor seemed to essentially be ignored - it didn't matter if the jar could be found or not. In 0.94.4 we're getting an error if this is the case. Is there a way to

Re: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space

2013-02-12 Thread James Taylor
IMO, I don't think it's safe to change the KV in-place. We always create a new KV in our coprocessors. James On Feb 12, 2013, at 6:41 AM, Mesika, Asaf asaf.mes...@gmail.com wrote: I'm seeing a very strange behavior: If I run a scan during major compaction, I can see both the modified Delta

Re: Row Key Design in time based aplication

2013-02-17 Thread James Taylor
Hello, Have you considered using Phoenix (https://github.com/forcedotcom/phoenix) for this use case? Phoenix is a SQL layer on top of HBase. For this use case, you'd connect to your cluster like this: Class.forName(com.salesforce.phoenix.jdbc.PhoenixDriver); // register driver Connection

Re: Row Key Design in time based aplication

2013-02-17 Thread James Taylor
spotting when using time as the key. Or the problem with always adding data to the right of the last row. The same would apply with the project id, assuming that it too is a number that grows incrementally with each project. On Feb 17, 2013, at 4:50 PM, James Taylor jtay...@salesforce.com wrote

availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread James Taylor
Unless I'm doing something wrong, it looks like the Maven repository (http://mvnrepository.com/artifact/org.apache.hbase/hbase) only contains HBase up to 0.94.3. Is there a different repo I should use, or if not, any ETA on when it'll be updated? James

Re: attributes - basic question

2013-02-22 Thread James Taylor
Same with us on Phoenix - we use the setAttribute on the client side and the getAttribute on the server side to pickup state on the Scan being executed. Works great. One thing to keep in mind, though: for a region observer coprocessor, the state you set on the client side will be sent to each

Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above

2013-02-25 Thread James Taylor
We are pleased to announce the immediate availability of Phoenix v 1.1, with support for HBase v 0.94.4 and above. Phoenix is a SQL layer on top of HBase. For details, see our announcement here: http://phoenix-hbase.blogspot.com/2013/02/annoucing-phoenix-v-11-support-for.html Thanks, James

Re: Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above

2013-02-26 Thread James Taylor
, Ted Yu yuzhih...@gmail.com wrote: I ran test suite and they passed: Tests run: 452, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] BUILD SUCCESS Good job. On Mon, Feb 25, 2013 at 9:35 AM, James Taylor jtay

Re: Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above

2013-02-26 Thread James Taylor
., but it illustrates the idea. On 02/26/2013 09:59 AM, Ted Yu wrote: In the first graph on the performance page, what does 'key filter' represent ? Thanks On Tue, Feb 26, 2013 at 9:53 AM, James Taylor jtay...@salesforce.comwrote: Both Phoenix and Impala provide SQL as a way to get at your data. Here

Re: Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above

2013-02-26 Thread James Taylor
You can query existing tables if the data is serialized in the way that Phoenix expects. For more detailed information and options, check out my response to this issue: https://github.com/forcedotcom/phoenix/issues/30 and check out our Data Type language reference here:

Re: endpoint coprocessor performance

2013-03-04 Thread James Taylor
Check your logs for whether your end-point coprocessor is hitting zookeeper on every invocation to figure out the region start key. Unfortunately (at least last time I checked), the default way of invoking an end point coprocessor doesn't use the meta cache. You can go through a combination of

Re: Rowkey design and presplit table

2013-03-07 Thread James Taylor
Another possible solution for you: use Phoenix: https://github.com/forcedotcom/phoenix Phoenix would allow you to model your scenario using SQL through JDBC, like this: Connection conn = DriverManager.connect(jdbc:phoenix:your zookeeper quorum); Statement stmt = conn.createStatement(

Re: HBase type support

2013-03-15 Thread James Taylor
Hi Nick, What do you mean by hashing algorithms? Thanks, James On 03/15/2013 10:11 AM, Nick Dimiduk wrote: Hi David, Native support for a handful of hashing algorithms has also been discussed. Do you think these should be supported directly, as opposed to using a fixed-length String or

Re: HBase Client.

2013-03-20 Thread James Taylor
Another one to add to your list: 6. Phoenix (https://github.com/forcedotcom/phoenix) Thanks, James On Mar 20, 2013, at 2:50 AM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have used Kundera, persistence overhead on HBase API is minimal considering feature set available for use within

Re: Understanding scan behaviour

2013-03-29 Thread James Taylor
Mohith, Are you wanting to reduce the amount of data you're scanning and bring down your query time when: - you have a row key has a multi-part row key of a string and time value and - you know the prefix of the string and a range of the time value? That's possible (but not easy) to do with

Re: HBase Types: Explicit Null Support

2013-04-01 Thread James Taylor
From the SQL perspective, handling null is important. Phoenix supports null in the following way: - the absence of a key value - an empty value in a key value - an empty value in a multi part row key - for variable length types (VARCHAR and DECIMAL) a null byte separator would be used if not

Re: HBase Types: Explicit Null Support

2013-04-01 Thread James Taylor
On 04/01/2013 04:41 PM, Nick Dimiduk wrote: On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.com wrote: From the SQL perspective, handling null is important. From your perspective, it is critical to support NULLs, even at the expense of fixed-width encodings at all

Essential column family performance

2013-04-07 Thread James Taylor
Hello, We're doing some performance testing of the essential column family feature, and we're seeing some performance degradation when comparing with and without the feature enabled: Performance of scan relative % of rows selectedto not enabling the feature

Re: Essential column family performance

2013-04-07 Thread James Taylor
Max Lapan tried to address has non essential column family carrying considerably more data compared to essential column family. Cheers On Sat, Apr 6, 2013 at 11:05 PM, James Taylor jtay...@salesforce.comwrote: Hello, We're doing some performance testing of the essential column family feature

Re: Essential column family performance

2013-04-08 Thread James Taylor
. does your filter utilize hint ? It would be easier for me and other people to reproduce the issue you experienced if you put your scenario in some test similar to TestJoinedScanners. Will take a closer look at the code Monday. Cheers On Sun, Apr 7, 2013 at 11:37 AM, James Taylor jtay

Re: Best way to query multiple sets of rows

2013-04-08 Thread James Taylor
Hi Greame, Are you familiar with Phoenix (https://github.com/forcedotcom/phoenix), a SQL skin over HBase? We've just introduced a new feature (still in the master branch) that'll do what you're looking for: transparently doing a skip scan over the chunks of your HBase data based on your SQL

Re: Essential column family performance

2013-04-08 Thread James Taylor
would be larger lazy CFs or/and low percentage of values selected. Can you try to increase the 2nd CF values' size and rerun the test? On Mon, Apr 8, 2013 at 10:38 AM, James Taylor jtay...@salesforce.comwrote: In the TestJoinedScanners.java, is the 40% randomly distributed or sequential? In our

Re: Speeding up the row count

2013-04-19 Thread James Taylor
Phoenix will parallelize within a region: SELECT count(1) FROM orders I agree with Ted, though, even serially, 100,000 rows shouldn't take any where near 6 mins. You say 100,000 rows. Can you tell us what it's ? Thanks, James On Apr 19, 2013, at 2:37 AM, Ted Yu yuzhih...@gmail.com wrote:

Re: Coprocessors

2013-04-25 Thread James Taylor
On 04/25/2013 03:35 PM, Gary Helmling wrote: I'm looking to write a service that runs alongside the region servers and acts a proxy b/w my application and the region servers. I plan to use the logic in HBase client's HConnectionManager, to segment my request of 1M rowkeys into sub-requests per

Re: Coprocessors

2013-04-25 Thread James Taylor
Thanks for the additional info, Sudarshan. This would fit well with the implementation of Phoenix's skip scan. CREATE TABLE t ( object_id INTEGER NOT NULL, field_type INTEGER NOT NULL, attrib_id INTEGER NOT NULL, value BIGINT CONSTRAINT pk PRIMARY KEY (object_id, field_type,

Re: Coprocessors

2013-04-25 Thread James Taylor
Our performance engineer, Mujtaba Chohan has agreed to put together a benchmark for you. We only have a four node cluster of pretty average boxes, but it should give you an idea. No performance impact for the attrib_id not being part of the PK since you're not filtering on them (if I

Re: HBase and Datawarehouse

2013-04-30 Thread James Taylor
Phoenix will succeed if HBase succeeds. Phoenix just makes it easier to drive HBase to it's maximum capability. IMHO, if HBase is to make further gains in the OLAP space, scans need to be faster and new, more compressed columnar-store type block formats need to be developed. Running inside

Re: Read access pattern

2013-04-30 Thread James Taylor
bq. The downside that I see, is the bucket_number that we have to maintain both at time or reading/writing and update it in case of cluster restructuring. I agree that this maintenance can be painful. However, Phoenix (https://github.com/forcedotcom/phoenix) now supports salting, automating

Re: Very poor read performance with composite keys in hbase

2013-04-30 Thread James Taylor
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It'll use all of the parts of your row key and depending on how much data you're returning back to the client, will query over 10 million row in seconds. James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Apr 30,

Re: Coprocessors

2013-05-01 Thread James Taylor
Sudarshan, Below are the results that Mujtaba put together. He put together two version of your schema: one with the ATTRIBID as part of the row key and one with it as a key value. He also benchmarked the query time both when all of the data was in the cache versus when all of the data was read

Re: Get all rows that DON'T have certain qualifiers

2013-05-14 Thread James Taylor
Hi Amit, Using Phoenix, the SQL skin over HBase (https://github.com/forcedotcom/phoenix), you'd do this: select * from myTable where value1 is null or value2 is null Regards, James http://phoenix-hbase.blogspot.com @JamesPlusPlus On May 14, 2013, at 6:56 AM, samar.opensource

[ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread James Taylor
We are pleased to announce the immediate availability of Phoenix 1.2 (https://github.com/forcedotcom/phoenix/wiki/Download). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread James Taylor
similar stuff in https://issues.apache.org/jira/browse/HBASE-7474. I am interested in knowing the details about that implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 12:29 PM, James Taylor jtay...@salesforce.comwrote: We are pleased to announce the immediate availability of Phoenix 1.2

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-17 Thread James Taylor
name/classes? I haven't got the opportunity to try out Phoenix yet but i would like to have a look at the implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 4:15 PM, James Taylor jtay...@salesforce.comwrote: Hi Anil, No HBase changes were required. We're already leveraging coprocessors

Re: Some Hbase questions

2013-05-19 Thread James Taylor
Hi Vivek, Take a look at the SQL skin for HBase called Phoenix (https://github.com/forcedotcom/phoenix). Instead of using the native HBase client, you use regular JDBC and Phoenix takes care of making the native HBase calls for you. We support composite row keys, so you could form your row

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-20 Thread James Taylor
give you a bit more detail. Regards, James On 05/20/2013 04:07 AM, Azuryy Yu wrote: why off-list? it would be better share here. --Send from my Sony mobile. On May 18, 2013 12:14 AM, James Taylor jtay...@salesforce.com wrote: Anil, Yes, everything is in the Phoenix GitHub repo. Will give you

Re: aggregation performance

2012-05-03 Thread James Taylor
We're seen reasonable performance, with the caveat that you need to parallelize the scan doing the aggregation. In our benchmarking, we have the client scan each region in parallel and have a coprocessor aggregate the row count and return a single row back (with the client then totaling the

Re: HBase aggregate query

2012-09-11 Thread James Taylor
iwannaplay games funnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find

Re: HBase aggregate query

2012-09-13 Thread James Taylor
No, there's no sorted dimension. This would be a full table scan over 40M rows. This assumes the following: 1) your regions are evenly distributed across a four node cluster 2) unique combinations of month * scene are small enough to fit into memory 3) you chunk it up on the client side and run

Re: querying hbase

2013-05-22 Thread James Taylor
Hey JM, Can you expand on what you mean? Phoenix is a single jar, easily deployed to any HBase cluster. It can map to existing HBase tables or create new ones. It allows you to use SQL (a fairly popular language) to query your data, and it surfaces it's functionality as a JDBC driver so that

Re: querying hbase

2013-05-22 Thread James Taylor
Hi Aji, With Phoenix, you pass through the client port in your connection string, so this would not be an issue. If you're familiar with SQL Developer, then Phoenix supports something similar with SQuirrel: https://github.com/forcedotcom/phoenix#sql-client Regards, James On 05/22/2013 07:42

Re: querying hbase

2013-05-23 Thread James Taylor
I did not try Phoenix yet, but I think you need to upload the JAR on all the region servers first, and then restart them, right? People might not have the rights to do that. That's why I thought Pheonix was overkill regarding the need to just list a table content on a screen. JM 2013/5/22 James

Re: Couting number of records in a HBase table

2013-05-28 Thread James Taylor
Another option is Phoenix (https://github.com/forcedotcom/phoenix), where you'd do SELECT count(*) FROM my_table Regards, James On 05/28/2013 03:25 PM, Ted Yu wrote: Take a look at http://hbase.apache.org/book.html#rowcounter Cheers On Tue, May 28, 2013 at 3:23 PM, Shahab Yunus

Re: querying hbase

2013-05-31 Thread James Taylor
On 05/24/2013 02:50 PM, Andrew Purtell wrote: On Thu, May 23, 2013 at 5:10 PM, James Taylor jtay...@salesforce.comwrote: Has there been any discussions on running the HBase server in an OSGi container? I believe the only discussions have been on avoiding talk about coprocessor reloading

Re: querying hbase

2013-06-01 Thread James Taylor
...@apache.orgjavascript:; wrote: On Thu, May 23, 2013 at 5:10 PM, James Taylor jtay...@salesforce.comjavascript:; wrote: Has there been any discussions on running the HBase server in an OSGi container? I believe the only discussions have been on avoiding talk about coprocessor reloading

Re: Scan performance

2013-06-22 Thread James Taylor
Hi Tony, Have you had a look at Phoenix(https://github.com/forcedotcom/phoenix), a SQL skin over HBase? It has a skip scan that will let you model a multi part row key and skip through it efficiently as you've described. Take a look at this blog for more info:

Re: HBase: Filters not working for negative integers

2013-06-26 Thread James Taylor
You'll need to flip the sign bit for ints and longs like Phoenix does. Feel free to borrow our serializers (in PDataType) or just use Phoenix. Thanks, James On 06/26/2013 12:16 AM, Madhukar Pandey wrote: Please ignore my previous mail..there was some copy paste issue in it.. this is the

Re: Schema design for filters

2013-06-27 Thread James Taylor
Hi Kristoffer, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper and issue SQL queries through Phoenix for your filtering. James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Jun 27, 2013, at 4:39 PM, Kristoffer

Re: Help in designing row key

2013-07-03 Thread James Taylor
Hi Flavio, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It will allow you to model your multi-part row key like this: CREATE TABLE flavio.analytics ( source INTEGER, type INTEGER, qual VARCHAR, hash VARCHAR, ts DATE CONSTRAINT pk PRIMARY KEY

Re: Help in designing row key

2013-07-03 Thread James Taylor
to have balanced regions as much as possible. So I think that in this case I will still use Bytes concatenation if someone confirm I'm doing it in the right way. On Wed, Jul 3, 2013 at 12:33 PM, James Taylor jtay...@salesforce.comwrote: Hi Flavio, Have you had a look at Phoenix (https

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread James Taylor
Hey Kiru, Another option for you may be to use Phoenix ( https://github.com/forcedotcom/phoenix). In particular, our skip scan may be what you're looking for: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html. Under-the-covers, the skip scan is doing a series of

Re: [ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-13 Thread James Taylor
Fantastic! Let me know if you're up for surfacing this through Phoenix. Regards, James On Tue, Aug 13, 2013 at 7:48 AM, Anil Gupta anilgupt...@gmail.com wrote: Excited to see this! Best Regards, Anil On Aug 13, 2013, at 6:17 AM, zhzf jeff jeff.z...@gmail.com wrote: very google local

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Would be interesting to compare against Phoenix's Skip Scan (http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html) which does a scan through a coprocessor and is more than 2x faster than multi Get (plus handles multi-range scans in addition to point gets). James On

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
), I will try to bench mark this table alone against Phoenix on another cluster. Thanks. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: James Taylor jtay...@salesforce.com To: user@hbase.apache.org user@hbase.apache.org Cc: Kiru

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
-- *From:* James Taylor jtay...@salesforce.com *To:* user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com *Sent:* Sunday, August 18, 2013 2:07 PM *Subject:* Re: Client Get vs Coprocessor scan performance Kiru, If you're able to post the key values, row key

Re: Client Get vs Coprocessor scan performance

2013-08-19 Thread James Taylor
it). Is there a way to do a sort of user defined function on a column ? that would take care of my calculation on that double. Thanks again. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: James Taylor jtay...@salesforce.com

Re: how to export data from hbase to mysql?

2013-08-27 Thread James Taylor
Or if you'd like to be able to use SQL directly on it, take a look at Phoenix (https://github.com/forcedotcom/phoenix). James On Aug 27, 2013, at 8:14 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Take a look at sqoop? Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit :

Re: HBase - stable versions

2013-09-04 Thread James Taylor
+1 to what Nicolas said. That goes for Phoenix as well. It's open source too. We do plan to port to 0.96 when our user community (Salesforce.com, of course, being one of them) demands it. Thanks, James On Wed, Sep 4, 2013 at 10:11 AM, Nicolas Liochon nkey...@gmail.com wrote: It's open

Re: Concurrent connections to Hbase

2013-09-05 Thread James Taylor
Hey Kiru, The Phoenix team would be happy to work with you to benchmark your performance if you can give us specifics about your schema design, queries, and data sizes. We did something similar for Sudarshan for a Bloomberg use case here[1]. Thanks, James [1].

Re: deploy saleforce phoenix coprocessor to hbase/lib??

2013-09-10 Thread James Taylor
When a table is created with Phoenix, its HBase table is configured with the Phoenix coprocessors. We do not specify a jar path, so the Phoenix jar that contains the coprocessor implementation classes must be on the classpath of the region server. In addition to coprocessors, Phoenix relies on

Re: 答复: Fastest way to get count of records in huge hbase table?

2013-09-10 Thread James Taylor
Use Phoenix (https://github.com/forcedotcom/phoenix) by doing the following: CREATE VIEW myHTableName (key VARBINARY NOT NULL PRIMARY KEY); SELECT COUNT(*) FROM myHTableName; As fenghong...@xiaomi.com said, you still need to scan the table, but Phoenix will do it in parallel and use a coprocessor

Re: deploy saleforce phoenix coprocessor to hbase/lib??

2013-09-11 Thread James Taylor
/lib? Our customers said it has to. But I feel it is unnecessary and weird. Can you confirm? Thanks Tian-Ying -Original Message- From: James Taylor [mailto:jtay...@salesforce.com] Sent: Tuesday, September 10, 2013 4:40 PM To: user@hbase.apache.org Subject: Re: deploy saleforce

Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-24 Thread James Taylor
Hey Anil, The solution you've described is the best we've found for Phoenix (inspired by the work of Alex at Sematext). You can do all of this in a few lines of SQL: CREATE TABLE event_data( who VARCHAR, type SMALLINT, id BIGINT, when DATE, payload VARBINARY CONSTRAINT pk PRIMARY KEY

Re: row filter - binary comparator at certain range

2013-10-21 Thread James Taylor
Take a look at Phoenix(https://github.com/forcedotcom/phoenix). It supports both salting and fuzzy row filtering through its skip scan. On Sun, Oct 20, 2013 at 10:42 PM, Premal Shah premal.j.s...@gmail.comwrote: Have you looked at FuzzyRowFilter? Seems to me that it might satisfy your

Re: row filter - binary comparator at certain range

2013-10-21 Thread James Taylor
Phoenix restricts salting to a single byte. Salting perhaps is misnamed, as the salt byte is a stable hash based on the row key. Phoenix's skip scan supports sub-key ranges. We've found salting in general to be faster (though there are cases where it's not), as it ensures better parallelization.

Re: row filter - binary comparator at certain range

2013-10-21 Thread James Taylor
this is the base access pattern. HTH -Mike On Oct 21, 2013, at 11:37 AM, James Taylor jtay...@salesforce.com wrote: Phoenix restricts salting to a single byte. Salting perhaps is misnamed, as the salt byte is a stable hash based on the row key. Phoenix's skip scan supports sub-key

Re: row filter - binary comparator at certain range

2013-10-21 Thread James Taylor
of your regions will be 1/2 the max size… but the size you really want and 8-16 regions will be up to twice as big. On Oct 21, 2013, at 3:26 PM, James Taylor jtay...@salesforce.com wrote: What do you think it should be called, because prepending-row-key-with-single-hashed-byte doesn't have

Re: row filter - binary comparator at certain range

2013-10-21 Thread James Taylor
to, so you end up with all regions half filled except for the last region in each 'modded' value. I wouldn't say its a bad thing if you plan for it. On Oct 21, 2013, at 5:07 PM, James Taylor jtay...@salesforce.com wrote: We don't truncate the hash, we mod it. Why would you expect that data

[ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread James Taylor
The Phoenix team is pleased to announce the immediate availability of Phoenix 2.1 [1]. More than 20 individuals contributed to the release. Here are some of the new features now available: * Secondary Indexing [2] to create and automatically maintain global indexes over your primary table. -

Re: [ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread James Taylor
yuzhih...@gmail.com wrote: From https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing : Is date_col a column from data table ? CREATE INDEX my_index ON my_table (date_col DESC, v1) INCLUDE (v3) SALT_BUCKETS=10, DATA_BLOCK_ENCODING='NONE'; On Thu, Oct 24, 2013 at 5:24 PM, James

Re: HBASE help

2013-10-28 Thread James Taylor
Take a look at Phoenix (https://github.com/forcedotcom/phoenix) which will allow you to issue SQL to directly create tables, insert data, and run queries over HBase using the data model described below. Thanks, James On Oct 28, 2013, at 8:47 AM, saiprabhur saiprab...@gmail.com wrote: Hi Folks,

Re: [ANNOUNCE] Phoenix v 2.1 released

2013-10-28 Thread James Taylor
. as fast or faster than a batched get). Thanks, James On Mon, Oct 28, 2013 at 11:14 AM, Asaf Mesika asaf.mes...@gmail.com wrote: I couldn't get the Row Value Constructor feature. Do you perhaps have a real world use case to demonstrate this? On Friday, October 25, 2013, James Taylor wrote

Re: hbase suitable for churn analysis ?

2013-11-14 Thread James Taylor
We ingest logs using Pig to write Phoenix-compliant HFiles, load those into HBase and then use Phoenix (https://github.com/forcedotcom/phoenix) to query directly over the HBase data through SQL. Regards, James On Thu, Nov 14, 2013 at 9:35 AM, sam wu swu5...@gmail.com wrote: we ingest data

Re: How to get Metadata information in Hbase

2013-11-25 Thread James Taylor
One other tool option for you is to use Phoenix. You use SQL to create a table and define the columns through standard DDL. Your columns make up the allowed KeyValues for your table and the metadata is surfaced through the standard JDBC metadata APIs (with column family mapping to table catalog).

Re: HFile block size

2013-11-25 Thread James Taylor
FYI, you can define BLOCKSIZE in your hbase-sites.xml, just like with HBase to make it global. Thanks, James On Mon, Nov 25, 2013 at 9:08 PM, Azuryy Yu azury...@gmail.com wrote: This is no way to declare global property in Phoneix, you have to declare BLOCKSIZE in each 'create' SQL. such

Re: HBase Phoenix questions

2013-11-27 Thread James Taylor
Amit, So sorry we didn't answer your question before - I'll post an answer now over on our mailing list. Thanks, James On Wed, Nov 27, 2013 at 8:46 AM, Amit Sela am...@infolinks.com wrote: I actually asked some of these questions in the phoenix-hbase-user googlegroup but never got an

Re: Online/Realtime query with filter and join?

2013-12-02 Thread James Taylor
I agree with Doug Meil's advice. Start with your row key design. In Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead with the columns that you'll filter against most frequently. Then, take a look at adding secondary indexes to speedup queries against other columns.

[ANNOUNCE] Phoenix accepted as Apache incubator

2013-12-13 Thread James Taylor
The Phoenix team is pleased to announce that Phoenix[1] has been accepted as an Apache incubator project[2]. Over the next several weeks, we'll move everything over to Apache and work toward our first release. Happy to be part of the extended family. Regards, James [1]

Re: Errors :Undefined table and DoNotRetryIOException while querying from phoenix to hbase

2013-12-14 Thread James Taylor
Mathan, We already answered your question on the Phoenix mailing list. If you have a follow up question, please post it there. This is not an HBase issue. Thanks, James On Dec 14, 2013, at 2:10 PM, mathan kumar immathanku...@gmail.com wrote: -- Forwarded message -- From: x

Re: Performance tuning

2013-12-21 Thread James Taylor
FYI, scanner caching defaults to 1000 in Phoenix, but as folks have pointed out, that's not relevant in this case b/c only a single row is returned from the server for a COUNT(*) query. On Sat, Dec 21, 2013 at 2:51 PM, Kristoffer Sjögren sto...@gmail.comwrote: Yeah, im doing a count(*) query

Re: secondary index feature

2013-12-23 Thread James Taylor
Henning, Jesse Yates wrote the back-end of our global secondary indexing system in Phoenix. He designed it as a separate, pluggable module with no Phoenix dependencies. Here's an overview of the feature: https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The section that discusses the

Re: use hbase as distributed crawl's scheduler

2014-01-02 Thread James Taylor
Otis, I didn't realize Nutch uses HBase underneath. Might be interesting if you serialized data in a Phoenix-compliant manner, as you could run SQL queries directly on top of it. Thanks, James On Thu, Jan 2, 2014 at 10:17 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Have a

Re: use hbase as distributed crawl's scheduler

2014-01-02 Thread James Taylor
Hi LiLi, Have a look at Phoenix (http://phoenix.incubator.apache.org/). It's a SQL skin on top of HBase. You can model your schema and issue your queries just like you would with MySQL. Something like this: // Create table that optimizes for your most common query // (i.e. the PRIMARY KEY

Re: use hbase as distributed crawl's scheduler

2014-01-03 Thread James Taylor
in your cluster. You can read more about salting here: http://phoenix.incubator.apache.org/salted.html On Thu, Jan 2, 2014 at 11:36 PM, Li Li fancye...@gmail.com wrote: thank you. it's great. On Fri, Jan 3, 2014 at 3:15 PM, James Taylor jtay...@salesforce.com wrote: Hi LiLi, Have a look

Re: use hbase as distributed crawl's scheduler

2014-01-03 Thread James Taylor
do parallel scans for each bucket and do a merge sort on the client, so the cost is pretty low for this (we also provide a way of turning this off if your use case doesn't need it). Two years, JM? Now you're really going to have to start using Phoenix :-) On Friday, January 3, 2014, James Taylor

Re: secondary index feature

2014-01-03 Thread James Taylor
love you see if your implementation can fit into the framework we wrote - we would be happy to work to see if it needs some more hooks or modifications - I have a feeling this is pretty much what you guys will need -Jesse On Mon, Dec 23, 2013 at 10:01 AM, James Taylor jtay

Re: use hbase as distributed crawl's scheduler

2014-01-03 Thread James Taylor
great but it's now only a experimental project. I want to use only hbase. could you tell me the difference of Phoenix and hbase? If I use hbase only, how should I design the schema and some extra things for my goal? thank you On Sat, Jan 4, 2014 at 3:41 AM, James Taylor jtay...@salesforce.com

Re: use hbase as distributed crawl's scheduler

2014-01-04 Thread James Taylor
? On Sat, Jan 4, 2014 at 3:43 PM, James Taylor jtay...@salesforce.com wrote: Hi LiLi, Phoenix isn't an experimental project. We're on our 2.2 release, and many companies (including the company for which I'm employed, Salesforce.com) use it in production today. Thanks, James

Re: Question on efficient, ordered composite keys

2014-01-14 Thread James Taylor
Hi Henning, My favorite implementation of efficient composite row keys is Phoenix. We support composite row keys whose byte representation sorts according to the natural sort order of the values (inspired by Lily). You can use our type system independent of querying/inserting data with Phoenix,

Re: HBase load distribution vs. scan efficiency

2014-01-20 Thread James Taylor
Hi William, Phoenix uses this bucket mod solution as well ( http://phoenix.incubator.apache.org/salted.html). For the scan, you have to run it in every possible bucket. You can still do a range scan, you just have to prepend the bucket number to the start/stop key of each scan you do, and then you

Re: HBase load distribution vs. scan efficiency

2014-01-20 Thread James Taylor
, Jan 20, 2014 at 8:15 PM, James Taylor jtay...@salesforce.com wrote: Hi William, Phoenix uses this bucket mod solution as well ( http://phoenix.incubator.apache.org/salted.html). For the scan, you have to run it in every possible bucket. You can still do a range scan, you just have

Re: creating tables from mysql to hbase

2014-02-18 Thread James Taylor
Hi Jignesh, Phoenix has support for multi-tenant tables: http://phoenix.incubator.apache.org/multi-tenancy.html. Also, your primary key constraint would transfer over as-is, since Phoenix supports composite row keys. Essentially your pk constraint values get concatenated together to form your row

  1   2   >