RE: HBase Schema for IPTC News ML G2

2014-03-04 Thread Vladimir Rodionov
...@carrieriq.com From: Jigar Shah [jigar.s...@infodesk.com] Sent: Monday, March 03, 2014 9:24 PM To: user@hbase.apache.org Subject: Re: HBase Schema for IPTC News ML G2 Hello Ted, I can think of implementation, based on which you provided solution. Current

Re: HBase Schema for IPTC News ML G2

2014-03-04 Thread Jigar Shah
() Thanks, Jigar Shah. From: Jigar Shah [jigar.s...@infodesk.com] Sent: Monday, March 03, 2014 9:24 PM To: user@hbase.apache.org Subject: Re: HBase Schema for IPTC News ML G2 Hello Ted, I can think of implementation, based on which you provided solution

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Boaretto, Ricardo
Hi, How frequent do you need to query older versions of some message? Regards, Ricardo Boaretto. On Mar 3, 2014 4:31 AM, Jigar Shah jigar.s...@infodesk.com wrote: I am working in news processing industry, current system processes more then million article per week. And provides this data in

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Jigar Shah
Hello Boaretto, Ricardo, Thanks for reply. Query on older versions of message is less frequent. Application provides a flag ignoreOldRevisions (default value is 'true'). Latest versions are of more importance in general. But still system need to keep track of all versions received for

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Ted Yu
When version is in its own column family, you can utilize essential column family support. See https://issues.apache.org/jira/browse/HBASE-5416 Cheers On Mar 2, 2014, at 11:31 PM, Jigar Shah jigar.s...@infodesk.com wrote: I am working in news processing industry, current system processes

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Jigar Shah
Hi Ted, Thanks for reply. I am more concerned about structure, what should be rowKey and column families (having each version of news as a column family will be a good idea ?). Will there be any problem if i orient my data in this way. |rowKey| | column-famlilies| guid

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Ted Yu
There seems to be some misunderstanding. The column families need to be defined at the time of table creation. My understanding was that there would be one column family called version. Each row in this table would have version number (1, 2, or 3, etc) in version column family, along with

HBase Schema for IPTC News ML G2

2014-03-03 Thread Jigar Shah
I am working in news processing industry, current system processes more then million article per week. And provides this data in real time to users, additionally it provides search capabilities via Lucene. We convert all news to a standard IPTC NewsML

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread James Taylor
Hi Jigar, Take a look at Apache Phoenix: http://phoenix.incubator.apache.org/ It allows you to use SQL to query over your HBase data and supports composite primary keys, so you could create a schema like this: create table news_message(guid varchar not null, version bigint not null,

Re: HBase Schema for IPTC News ML G2

2014-03-03 Thread Jigar Shah
Thanks James, Seems very interesting. On 03/04/2014 03:02 AM, James Taylor wrote: Hi Jigar, Take a look at Apache Phoenix: http://phoenix.incubator.apache.org/ It allows you to use SQL to query over your HBase data and supports composite primary keys, so you could create a schema like this:

HBase Schema for IPTC News ML G2

2014-03-02 Thread Jigar Shah
I am working in news processing industry, current system processes more then million article per week. And provides this data in real time to users, additionally it provides search capabilities via Lucene. We convert all news to a standard IPTC NewsML