HBase/HDFS Data Nodes Management

2013-08-12 Thread Oussama Jilal
Hello everyone, I have some questions that I wish to get answers to regarding how HBase and HDFS manages the data nodes. Q1- Can I remove a node from a cluster without loosing data ? Q2- If yes (Q1), does that depend on the replication of data between nodes or I don't need to worry about it

Re: HBase/HDFS Data Nodes Management

2013-08-12 Thread Jean-Marc Spaggiari
Hi Oussama. 1) That's all the goal of Hadoop and HBase ;) You might want to ready Hadoop the Definitive guide and HBase the Definitive guide... 2) HBase is based on Hadoop and take advantage of it's repplication process. 3) There is also a way to backup the data manually or to configure

Re: HBase Test issue

2013-08-12 Thread Jean-Marc Spaggiari
Hi, You might most probably want to be more talkative if you are expecting some help from the community. Like: Hi, I tried HBase version XXX. I did 'this' 'that' and 'that' and doing it I faced the issue below. Can you please let me know where I should start to look? Thanks a lot for your help.

Table and Family

2013-08-12 Thread Bing Li
Hi, all, My understandings about HBase table and its family are as follows. 1) Each table can consist of multiple families; 2) When retrieving with SingleColumnValueFilter, if the family is specified, other families contained in the same table are not affected. Are these claims right? But I

Re: Table and Family

2013-08-12 Thread Stas Maksimov
Hi there, On your second point, I don't think column family can ever be an optional parameter, so I'm not sure this understanding is correct. Regards, Stas. On 12 August 2013 17:22, Bing Li lbl...@gmail.com wrote: Hi, all, My understandings about HBase table and its family are as follows.

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread James Taylor
Hey Kiru, Another option for you may be to use Phoenix ( https://github.com/forcedotcom/phoenix). In particular, our skip scan may be what you're looking for: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html. Under-the-covers, the skip scan is doing a series of

Performance Are Affected? - Table and Family

2013-08-12 Thread Bing Li
Dear all, I have one additional question about table and family. A table which has less families is faster than the one which has more families if the amount of data they have is the same? Correct or not? Is it a higher performance design to put fewer families into a table? Thanks so much!

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread Scott Kuehn
Hi JM, After forcing major compactions on all tables, the locality index crept up to ~100%. This means the table I suspected to be problematic was actually fine, and some of the legacy tables on the cluster had a high percentage of non-local blocks. A per-table version of hdfsBlocksLocalityIndex

Re: Performance Are Affected? - Table and Family

2013-08-12 Thread Stas Maksimov
Hi Bing, Generally it is not advised to have more than 2-3 column families, unless you are using them absolutely separately from each other. Please see here: http://hbase.apache.org/book/number.of.cfs.html Thanks, Stas On 12 August 2013 18:00, Bing Li lbl...@gmail.com wrote: Dear all, I

Is repeatedly seeing ZK EndOfStreamException normal?

2013-08-12 Thread Dongcai Shen / Xiaoli Shen
Hi, there. I ran some HBase workload. The job completed normally. The HBase uses an external ZK service. However, I saw this ZK exception showing up repeatedly. Is this a normal phenomenon? Many thanks. EndOfStreamException: Unable to read additional data from client sessionid 0xXXX, likely

PrefixFilter

2013-08-12 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
Anyone know if the prefix filter[1] does a full table scan? 1 - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html

Re: PrefixFilter

2013-08-12 Thread Ted Yu
In filterAllRemaining() method: public boolean filterAllRemaining() { return passedPrefix; } In filterRowKey(): // if they are equal, return false = pass row // else return true, filter row // if we are passed the prefix, set flag int cmp = Bytes.compareTo(buffer,

Re: Hbase update use case

2013-08-12 Thread Asaf Mesika
If you can mark a row by adding a column qualifier which will be used as your flag by its existence, and its name will be lexicographically first, then it won't be slow as you said about filters below. On Monday, August 12, 2013, ccalugaru wrote: Hi all, I have the following hbase use case:

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread Kiru Pakkirisamy
James, We actually planned to use Phoenix for this project. But we did not have much time to design on top of Phoenix.  Also, this app is more like a 'search' app and I wanted it to be doing just key lookups. There is no write and everything is in block cache. Still, yes, let me take a look at

Re: PrefixFilter

2013-08-12 Thread Ted Yu
Adding back user@ bq. does it jump directly to Prefix3 I don't think so. Are your prefixes of fixed length ? If so, take a look at FuzzyRowFilter. Cheers On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) skada...@bloomberg.net wrote: Ted: Thanks for looking that

Re: PrefixFilter

2013-08-12 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
I'm willing to be told I'm completely wrong here, but it seems like the prefix filter should be capable of using the same mechanism used in a row-key lookup or a scan with a start and stop row. If HBase were to be like a hash table with no notion of sorted-ness, I can understand a partial-key

Re: PrefixFilter

2013-08-12 Thread anil gupta
Hi Sudarshan, While using the prefix filter, you also have to set the startRow() and stopRow for the behavior that you are expecting. This kind of discussion have been done previously on mailing list, yet no changes have been done to behavior of PrefixFilter. Setting the startRow(Prefix3) will

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread lars hofhansl
A write in HDFS (by default) places one copy on the local datanode, another one on a node in a different rack (when applicable), and a third one on a node in the same rack. HBase gets data locality by being co-located with the data nodes, so after a compaction all blocks of the compacted

Re: PrefixFilter

2013-08-12 Thread lars hofhansl
What Anil said. Filters are executed per Store (i.e. per region per column family). So each filter in each store would need seek to the start row. It is more efficient to let the scanner do that ahead of time by setting the startrow to the prefix. We should document that if we haven't. -- Lars

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread lars hofhansl
Now that I wrote this, I think we should improve that. For example we could add an RPC to the regionserver and have the regionserver who would own the region copy the appropriate part of the file (then the data would be local). Or even simpler, instead of actually copying the files we could

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread Elliott Clark
On Mon, Aug 12, 2013 at 9:58 PM, lars hofhansl la...@apache.org wrote: For example we could add an RPC to the regionserver and have the regionserver who would own the region copy the appropriate part of the file (then the data would be local). Or even simpler, instead of actually copying the