Re: secondary index feature

2013-12-23 Thread Henning Blohm
Thanks for pointing to Lily. I read about it, but it seems to add significant additional infrastructure, which I am trying to avoid out of fear of adding unwanted complexity. That may be unjustified, and I may need to take another look. Henning On 22.12.2013 17:41, Anoop John wrote: HIndex

Re: secondary index feature

2013-12-23 Thread Henning Blohm
Lars, that is exactly why I am hesitant to use one the core level generic approaches (apart from having difficulties to identify the still active projects): I have doubts I can sufficiently explain to myself when and where they fail. With toolbox approach I meant to say that turning entity

RowCounter ClassNotFoundException: com.google.common.base.Preconditions

2013-12-23 Thread Jean-Marc Spaggiari
Any idea why I'm getting this? Error: java.lang.ClassNotFoundException: com.google.common.base.Preconditions at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native

Re: RowCounter ClassNotFoundException: com.google.common.base.Preconditions

2013-12-23 Thread Ted Yu
Which version of HBase do you use ? Can you show us the command line for RowCounter ? Thanks On Mon, Dec 23, 2013 at 8:11 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Any idea why I'm getting this? Error: java.lang.ClassNotFoundException: com.google.common.base.Preconditions

HBase -ROOT- table have some hdfs blocks missing. (v.0.92)

2013-12-23 Thread ibrahim nelman
Hi All, I just shutdown my 10-nodes hbase cluster (0.92) and restart the machine. After restart, I can not start HBase because there is an hdfs file belongs to -ROOT- tables whose block is totally missing. Here is what hadoop fsck return :

Re: RowCounter ClassNotFoundException: com.google.common.base.Preconditions

2013-12-23 Thread Jean-Marc Spaggiari
I'm running HBase 0.94.15. time /home/hadoop/hadoop-1.2.1/bin/hadoop jar /home/hbase/hbase-0.94.3/hbase-0.94.15.jar rowcounter -Dmapred.map.tasks.speculative.execution=false -Dhbase.client.scanner.caching=100 page_proposed I also tried to copy the guava jar into hadoop lib directory with no

Re: secondary index feature

2013-12-23 Thread James Taylor
Henning, Jesse Yates wrote the back-end of our global secondary indexing system in Phoenix. He designed it as a separate, pluggable module with no Phoenix dependencies. Here's an overview of the feature: https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The section that discusses the

Consistent Backup strategy

2013-12-23 Thread Timo Schaepe
Hey guys, we are searching for a consistent backup strategy with the export tool. Is this article still up-to-date and I can use it? http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html Thanks for answers. cheers, Timo smime.p7s Description: S/MIME

RE: Consistent Backup strategy

2013-12-23 Thread Vladimir Rodionov
Offline snapshots? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Timo Schaepe [t...@timoschaepe.de] Sent: Monday, December 23, 2013 10:53 AM To: user@hbase.apache.org

Re: Consistent Backup strategy

2013-12-23 Thread Timo Schaepe
Sorry, I forgot to mention. Taking the cluster offline is not an option. We need an consistent backup of an online cluster. Our plan B is to build a second cluster for replication and take offline snapshots from this cluster. bye, Timo Am 23.12.2013 um 11:02 schrieb Vladimir Rodionov

Re: Consistent Backup strategy

2013-12-23 Thread Matteo Bertozzi
can you define what consistent means to you? for example online snapshots are row-consistent, but the snapshot of Region 1 may be taken at time T0 and the snapshot of Region N a time T0 + X seconds Matteo On Mon, Dec 23, 2013 at 7:07 PM, Timo Schaepe t...@timoschaepe.de wrote: Sorry, I

Re: secondary index feature

2013-12-23 Thread Henning Blohm
James, that is super interesting material! Thanks, Henning On 23.12.2013 19:01, James Taylor wrote: Henning, Jesse Yates wrote the back-end of our global secondary indexing system in Phoenix. He designed it as a separate, pluggable module with no Phoenix dependencies. Here's an overview of

Re: secondary index feature

2013-12-23 Thread Jesse Yates
The work that James is referencing grew out of the discussions Lars and I had (which lead to those blog posts). The solution we implement is designed to be generic, as James mentioned above, but was written with all the hooks necessary for Phoenix to do some really fast updates (or skipping

Re: Consistent Backup strategy

2013-12-23 Thread Timo Schaepe
With consistent I mean it like in the mentioned blog article from Lars. We want to take a backup of our data in one specific time range, where the data is consistent in that time range. My thoughts till now: I want to do a full backup of our data every Saturday. During the week I want to take

Performance between HBaseClient scan and HFileReaderV2

2013-12-23 Thread Jerry Lam
Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold

Re: Consistent Backup strategy

2013-12-23 Thread Matteo Bertozzi
If you can rely on timestamps, you can use the Export tool as showed in the blog post without problem. The Export cmdline interface is not changed. Matteo On Mon, Dec 23, 2013 at 8:03 PM, Timo Schaepe t...@timoschaepe.de wrote: With consistent I mean it like in the mentioned blog article

Re: Consistent Backup strategy

2013-12-23 Thread lars hofhansl
We're doing a version of that at Salesforce (we have our own M/R jobs, but the principle is the same). Soon we'll run the backup M/R job over a snapshot for performance reasons, but even then the principle is the same. Specifically we're keeping 48h worth of life data in HBase itself (TTL=48h,

Schema Design Newbie Question

2013-12-23 Thread Kamal Bahadur
Hello, I am just starting to use HBase and I am coming from Cassandra world.Here is a quick background regarding my data: My system will be storing data that belongs to a certain category. Currently I have around 1000 categories. Also note that some categories produce lot more data than others.

Re: Schema Design Newbie Question

2013-12-23 Thread Dhaval Shah
A 1000 CFs with HBase does not sound like a good idea.  category + timestamp sounds like the better of the 2 options you have thought of.  Can you tell us a little more about your data?    Regards, Dhaval From: Kamal Bahadur mailtoka...@gmail.com To:

Re: Schema Design Newbie Question

2013-12-23 Thread Kamal Bahadur
Hi Dhaval, Thanks for the quick response! Why do you think having more files is not a good idea? Is it because of OS restrictions? I get around 50 million records a day and each record contains ~25 columns. Values for each column are ~30 characters. Kamal On Mon, Dec 23, 2013 at 3:35 PM,

Re: Schema Design Newbie Question

2013-12-23 Thread lars hofhansl
The HDFS NameNode will have to deal with lots of small files (currently HBase cannot flush column families independently, so if one is flushed all of them are). The other reason is that scanning will the slow (if your scan involves many column families, due to the merge sort HBase needs to

Re: Schema Design Newbie Question

2013-12-23 Thread Kamal Bahadur
I am now convinced that option 1 will be the best option for my data. Thanks Lars! Kamal On Mon, Dec 23, 2013 at 4:12 PM, lars hofhansl la...@apache.org wrote: The HDFS NameNode will have to deal with lots of small files (currently HBase cannot flush column families independently, so if one

Re: RowCounter ClassNotFoundException: com.google.common.base.Preconditions

2013-12-23 Thread Jean-Marc Spaggiari
Did some investigations. Strange. ClassLoader parent = Classes.class.getClassLoader(); Returns null Can this be related to the JDK I use? (1.7.0_45) 2013/12/23 Jean-Marc Spaggiari jean-m...@spaggiari.org I'm running HBase 0.94.15. time /home/hadoop/hadoop-1.2.1/bin/hadoop jar

Re: RowCounter ClassNotFoundException: com.google.common.base.Preconditions

2013-12-23 Thread lars hofhansl
I'd bet this is the same issue that got you the strange AbstractMethodErrorexception you've seen before. Checking the code, we do not explicitly set com.google.common.base.Preconditions as dependency, and we probably should, but we  com.google.common.base.Function, which is in the same jar.