On Thu, May 16, 2013 at 11:49 AM, Varun Sharma va...@pinterest.com wrote:
Hi,
I am wondering what happens when we add the following:
row, col, timestamp -- v1
A flush happens. Now, we add
row, col, timestamp -- v2
A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie
We use the index blocks to find the right block (we have 64k blocks by
default). Once we found the block we a linear search for the KV we're looking
for.
In your example, you'd find the first block that contains a KV for row1c and
then seek into that block until you found your KV. We cannot do
Hi Viral,
some questions:
Are you adding new data or deleting data over time?
Do you have bloom filters enabled?
Which version of Hadoop?
Anything funny the Datanode logs?
-- Lars
- Original Message -
From: Viral Bajaria viral.baja...@gmail.com
To: user@hbase.apache.org
Thanks for all the help in advance!
Answers inline..
Hi Viral,
some questions:
Are you adding new data or deleting data over time?
Yes I am continuously adding new data. The puts have not slowed down but
that could also be an after effect of deferred log flush.
Do you have bloom
On Thu, May 16, 2013 at 3:26 PM, Varun Sharma va...@pinterest.com wrote:
Referring to your comment above again
If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...)
I
Thanks Stack and Lars for the detailed answers - This question is not
really motivated by performance problems...
So the index indeed knows what part of the HFile key is the row and which
part is the column qualifier. Thats what I needed to know. I initially
thought it saw it as an opaque
Generally we start with seeking on all the Hfiles corresponding to the
region and load the blocks that correspond to that row key specified in the
scan.
If row1 and row1c are in the same block then we may start with row1. If
they are in different blocks then we will start with the block
Is it a bug or part of design. It seems more of a design to me. Can someone
guide me through the purpose of this feature.
Thanks
Rishabh
From: Rishabh Agrawal
Sent: Friday, May 17, 2013 4:24 PM
To: user@hbase.apache.org
Subject: Doubt Regading HLogs
Hello,
I am working with Hlogs of Hbase
That's HDFS.
When a file is currently written, the size is not known, as the write is in
progress. So the namenode reports a size of zero (more exactly, it does not
take into account the hdfs block beeing written when it calculates the
size). When you read, you go to the datanode owning the data,
Thanks Nicolas,
When will this file be finalized. Is it time bound? Or it will be always be
zero for last one (even if it contains the data)
-Original Message-
From: Nicolas Liochon [mailto:nkey...@gmail.com]
Sent: Friday, May 17, 2013 4:39 PM
To: user
Subject: Re: Doubt Regading HLogs
In this situation, you can set the
property
namehbase.regionserver.
logroll.period/name
value360/value
/property
to a short value, let's say 3000 and then you can see your log file with
current size after 3 seconds.
To Nicolas,
I guess he wants somehow to analyze the HLog.
Hi all,
we are seeing very strange behavior of HBase (version 0.90.6-cdh3u5) in
the following scenario:
1) Open scanner and start scanning.
2) Check order of returned keys (simple test if next key is
lexigraphically greater than the previous one).
3) The check may occasionally fail.
Hi Jan,
0.90.6 is a very old version of HBase... Will you have a chance to migrate
to a more recent one? Most of your issues might probably be already fixed.
JM
2013/5/17 Jan Lukavský jan.lukav...@firma.seznam.cz
Hi all,
we are seeing very strange behavior of HBase (version 0.90.6-cdh3u5)
Yes, it's by design.
The last log file the one beeing written by HBase The safe option is to
wait for this file to be closed by HBase. As Yong said, you can change the
roll parameter if you want it to be terminated sooner, but changing this
parameter impacts the hdfs namenode load.10 minutes is
Hi,
I wonder if there are tool similar
to org.apache.hadoop.hbase.mapreduce.ImportTsv. IimportTsv read from tsv
file and create HFiles which are ready to be loaded into the corresponding
region by another
tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I want
is to read from
bq. What I want is to read from some hbase table and create hfiles directly
Can you describe your use case in more detail ?
Thanks
On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou zhou.jiny...@gmail.comwrote:
Hi,
I wonder if there are tool similar
to
Look at how much Hard Disk utilization you have (IOPS / Svctm). You may
just be under scaled for the QPS you desire for both read + write load. If
you are performing random gets, you could expect around the low to mid
100's IOPS/sec per HDD. Use bonnie++ / IOZone / IOPing to verify.
Also you
Actually, I wanted to update each row of a table each day. no new data
needed, only some value will be changed by recalculation. It looks like
every time I do, the data is doubled in table. even though it is update. I
believe even an update will result in new hfiles and the cluster is then
very
If I understood your usecase correctly, then if you don't need to maintain
older versions of data then why don't you set the 'max version' parameter
for your table to 1? I believe that the increase in data even in case of
updates is due to that (?) Have you tried that?
Regards,
Shahab
On Fri,
Anil,
Yes, everything is in the Phoenix GitHub repo. Will give you more detail
of specific packages and classes off-list.
Thanks,
James
On 05/16/2013 05:33 PM, anil gupta wrote:
Hi James,
Is this implementation present in the GitHub repo of Phoenix? If yes, can
you provide me the package
Jinyuan:
bq. no new data needed, only some value will be changed by recalculation.
Have you considered using coprocessor to fullfil the above task ?
Cheers
On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
If I understood your usecase correctly, then if you don't
Yes bloom filters have been enabled: ROWCOL
Can u try with ROW bloom?
-Anoop-
On Fri, May 17, 2013 at 12:20 PM, Viral Bajaria viral.baja...@gmail.comwrote:
Thanks for all the help in advance!
Answers inline..
Hi Viral,
some questions:
Are you adding new data or deleting data
On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll phobos...@gmail.com wrote:
Look at how much Hard Disk utilization you have (IOPS / Svctm). You may
just be under scaled for the QPS you desire for both read + write load. If
you are performing random gets, you could expect around the low to mid
I had thought about coprocessor. But I had an impression that coprocessor
is last option one shoud try becuase it is so invasive to the jvm running
hbase. Not sure about current status though. However, what the croprocessor
can give me in this case is less network load. My problem is the hbase's
will try that. Thanks,
On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
If I understood your usecase correctly, then if you don't need to maintain
older versions of data then why don't you set the 'max version' parameter
for your table to 1? I believe that the
Hi all,
I have been trying to run MapReduce job that involves using Hbase as source and
sink. I have Hbase 0.94.2 and Hadoop 2.0 installed using Cloudera repository
and following their instructions.
When I use HBase client package version 0.94.2 and above, it gave the following
DNS related
We have some meetups happening over the next few months. Sign up if you
are interested in attending (or if you would like to present, write me
off-list).
First up, there is hbasecon2013 (http://hbasecon.com) on June 13th in SF.
It is shaping up to be a great community day out with a bursting
This has come up in the past:
http://search-hadoop.com/m/mDn0i2kjGA32/NumberFormatException+dnssubj=unable+to+resolve+the+DNS+name
Or check out this old thread:
http://mail.openjdk.java.net/pipermail/jdk7-dev/2010-October/001605.html
St.Ack
On Fri, May 17, 2013 at 11:17 AM, Heng Sok
28 matches
Mail list logo