Hello everyone,
I have some questions that I wish to get answers to regarding how HBase
and HDFS manages the data nodes.
Q1- Can I remove a node from a cluster without loosing data ?
Q2- If yes (Q1), does that depend on the replication of data between
nodes or I don't need to worry about it
Hi Oussama.
1) That's all the goal of Hadoop and HBase ;) You might want to ready
Hadoop the Definitive guide and HBase the Definitive guide...
2) HBase is based on Hadoop and take advantage of it's repplication process.
3) There is also a way to backup the data manually or to configure
Hi,
You might most probably want to be more talkative if you are expecting some
help from the community.
Like:
Hi, I tried HBase version XXX. I did 'this' 'that' and 'that' and doing it
I faced the issue below. Can you please let me know where I should start to
look? Thanks a lot for your help.
Hi, all,
My understandings about HBase table and its family are as follows.
1) Each table can consist of multiple families;
2) When retrieving with SingleColumnValueFilter, if the family is
specified, other families contained in the same table are not
affected.
Are these claims right? But I
Hi there,
On your second point, I don't think column family can ever be an optional
parameter, so I'm not sure this understanding is correct.
Regards,
Stas.
On 12 August 2013 17:22, Bing Li lbl...@gmail.com wrote:
Hi, all,
My understandings about HBase table and its family are as follows.
Hey Kiru,
Another option for you may be to use Phoenix (
https://github.com/forcedotcom/phoenix). In particular, our skip scan may
be what you're looking for:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html.
Under-the-covers, the skip scan is doing a series of
Dear all,
I have one additional question about table and family.
A table which has less families is faster than the one which has more
families if the amount of data they have is the same? Correct or not?
Is it a higher performance design to put fewer families into a table?
Thanks so much!
Hi JM,
After forcing major compactions on all tables, the locality index
crept up to ~100%. This means the table I suspected to be problematic
was actually fine, and some of the legacy tables on the cluster had a
high percentage of non-local blocks. A per-table version of
hdfsBlocksLocalityIndex
Hi Bing,
Generally it is not advised to have more than 2-3 column families, unless
you are using them absolutely separately from each other. Please see here:
http://hbase.apache.org/book/number.of.cfs.html
Thanks,
Stas
On 12 August 2013 18:00, Bing Li lbl...@gmail.com wrote:
Dear all,
I
Hi, there.
I ran some HBase workload. The job completed normally. The HBase uses an
external ZK service. However, I saw this ZK exception showing up
repeatedly. Is this a normal phenomenon? Many thanks.
EndOfStreamException: Unable to read additional data from client sessionid
0xXXX, likely
Anyone know if the prefix filter[1] does a full table scan?
1 -
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html
In filterAllRemaining() method:
public boolean filterAllRemaining() {
return passedPrefix;
}
In filterRowKey():
// if they are equal, return false = pass row
// else return true, filter row
// if we are passed the prefix, set flag
int cmp = Bytes.compareTo(buffer,
If you can mark a row by adding a column qualifier which will be used as
your flag by its existence, and its name will be lexicographically first,
then it won't be slow as you said about filters below.
On Monday, August 12, 2013, ccalugaru wrote:
Hi all,
I have the following hbase use case:
James,
We actually planned to use Phoenix for this project. But we did not have much
time to design on top of Phoenix.
Also, this app is more like a 'search' app and I wanted it to be doing just
key lookups. There is no write and everything is in block cache.
Still, yes, let me take a look at
Adding back user@
bq. does it jump directly to Prefix3
I don't think so.
Are your prefixes of fixed length ?
If so, take a look at FuzzyRowFilter.
Cheers
On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
skada...@bloomberg.net wrote:
Ted: Thanks for looking that
I'm willing to be told I'm completely wrong here, but it seems like the prefix
filter should be capable of using the same mechanism used in a row-key lookup
or a scan with a start and stop row.
If HBase were to be like a hash table with no notion of sorted-ness, I can
understand a partial-key
Hi Sudarshan,
While using the prefix filter, you also have to set the startRow() and
stopRow for the behavior that you are expecting.
This kind of discussion have been done previously on mailing list, yet no
changes have been done to behavior of PrefixFilter.
Setting the startRow(Prefix3) will
A write in HDFS (by default) places one copy on the local datanode, another one
on a node in a different rack (when applicable), and a third one on a node in
the same rack.
HBase gets data locality by being co-located with the data nodes, so after a
compaction all blocks of the compacted
What Anil said.
Filters are executed per Store (i.e. per region per column family). So each
filter in each store would need seek to the start row.
It is more efficient to let the scanner do that ahead of time by setting the
startrow to the prefix.
We should document that if we haven't.
-- Lars
Now that I wrote this, I think we should improve that.
For example we could add an RPC to the regionserver and have the regionserver
who would own the region copy the appropriate part of the file (then the data
would be local). Or even simpler, instead of actually copying the files we
could
On Mon, Aug 12, 2013 at 9:58 PM, lars hofhansl la...@apache.org wrote:
For example we could add an RPC to the regionserver and have the regionserver
who would own the region copy the appropriate part of the file (then the data
would be local). Or even simpler, instead of actually copying the
21 matches
Mail list logo