Re: HBase failing to restart in single-user mode

2015-05-18 Thread Viral Bajaria
Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com

Re: AsyncHBase 1.5.0 has been released

2014-01-29 Thread Viral Bajaria
when decoding fails with an uncaught exception. Release 1.5.0. Brandon Forehand (1): Add support for prefetching the meta region. Phil Smith (1): Here's some one-liners to compile and run tests. St.Ack (1): Make mvn build accomodate protobuf files Viral Bajaria

Re: Online/Realtime query with filter and join?

2013-12-02 Thread Viral Bajaria
Pradeep, correct me if I am wrong but prestodb has not released the HBase plugin as yet or they did and maybe I missed the announcement ? I agree with what Doug is saying here, you can't achieve 100ms on every kind of query on HBase unless and until you design the rowkey in a way to help you

Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)

2013-12-02 Thread Viral Bajaria
. Viral Bajaria (2): Initial commit for ScanFilter. Add more scanner filters. Xun Liu (1): Properly honor timestamps in DeleteRequest. -- Benoit tsuna Sigoure

Re: Scan performance

2013-08-08 Thread Viral Bajaria
Hi Tony, I know it's been a while and am not sure if you already figured out the issue but try taking at HBASE-9079 and see if it's similar to the problem that you are facing with FuzzyRowFilter. I have attached a patch to that ticket too and have verified that it fixed things for me in

Re: FilterList: possible bug in getNextKeyHint

2013-07-29 Thread Viral Bajaria
Attached are 2 patches: one of them is TestFail.patch where I show that the behavior is not as expected. On the other hand, the second patch is with the changes that I did to FilterList and the behavior is as expected. I have tested the state maintenance on two filters that implement

Re: FilterList: possible bug in getNextKeyHint

2013-07-29 Thread Viral Bajaria
Attached the two test patches to this JIRA: https://issues.apache.org/jira/browse/HBASE-9079 On Mon, Jul 29, 2013 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote: Can you log a JIRA and attach the patches there ? Your attachments did not go through.

FilterList: possible bug in getNextKeyHint

2013-07-28 Thread Viral Bajaria
Hi, I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a

Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
Thanks guys for going through that never-ending email! I will create the JIRA for block cache eviction and the regionserver assignment command. Ted already pointed to the JIRA which tries to go a different datanode if the primary is busy (I will add comments to that one). To answer Andrews'

Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
I was able to reproduce the same regionserver asking for the same local block over 300 times within the same 2 minute window by running one of my heavy workloads. Let me try and gather some stack dumps. I agree that jstack crashing the jvm is concerning but there is nothing in the errors to know

Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
Good question. When I looked at the logs, it's not clear from it whether it's reading a meta or data block. Is there any kind of log line that indicates that ? Given that it's saying that it's ready from a startOffset I would assume this is a data block. A question that comes to mind, is this

Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
We haven't disable block cache. So I doubt that's the problem. On Mon, Jul 8, 2013 at 4:50 PM, Varun Sharma va...@pinterest.com wrote: FYI, if u disable your block cache - you will ask for Index blocks for every single request. So such a high rate of request is plausible for Index blocks even

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Viral Bajaria
Asaf, the hdfsBlocksLocalityIndex is around 76 and it's 86 for the regionserver which is under the heaviest load for IO. Ram, I saw that you updated the JIRA saying the checksum metrics are available in the regionserver. What group are they published under ? I checked my ganglia stats and can't

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Viral Bajaria
I saw the same code and also saw the following in RegionServerMetrics.java /** * Number of times checksum verification failed. */ public final MetricsLongValue checksumFailuresCount = new MetricsLongValue(checksumFailuresCount, registry); The registry is then registered in JMX via: //

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Viral Bajaria
Yes I was checking 0.94 code. And sorry for the brain fart, I just spotted the metric in ganglia. There are just too many metrics in ganglia and skipped this one! It was under the group hbase.regionserver, while I was expected it to be hbase.regionserver.RegionServerStatistics. The chart shows

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Viral Bajaria
No worries, Anoop. Here is some clarification for this chain. It started initially to figure out how to check whether SCR is effective at the RS or not. I could not find the metric anywhere in ganglia/JMX and didn't find any RegionServer level metric either and so started looking at my DN logs. I

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-05 Thread Viral Bajaria
Sweet! enabled debug logging for org.apache.hadoop.hdfs.DFSClient and found the New BlockReaderLocal log line. Got some verification that SCR is ON and working fine. Regarding no clienttrace lines in DN, I verified that too. Last time I saw a few lines since I forgot to remove HDFS_WRITE lines.

question about clienttrace logs in hdfs and shortcircuit read

2013-07-04 Thread Viral Bajaria
Hi, If I have enabled shortcircuit reads, should I ever be seeing clienttrace logs in the datanode for the regionserver DFSClient that is co-located with the datanode ? Besides that is there any other way to verify that my setting for short circuit reads is working fine. Thanks, Viral

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-04 Thread Viral Bajaria
I looked up the ganglia metrics that I have setup for the cluster (both HBase and HDFS) and don't see it there. Is it not published to ganglia ? On Wed, Jul 3, 2013 at 11:33 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I think there is a metric in HBase and HDFS (JMX) reflecting that. If you

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-04 Thread Viral Bajaria
Currently datanode shows a lot of clienttrace logs for DFSClient. I did a quick command line check to see how many clienttrace do I get per active RegionServer and it seems the local RegionServer had very few ( 1%). Given that datanode logs are too noisy with clienttrace, I was hoping to find the

Re: question about clienttrace logs in hdfs and shortcircuit read

2013-07-04 Thread Viral Bajaria
Created the JIRA at: https://issues.apache.org/jira/browse/HBASE-8868 Sorry if I got a few fields wrong, will learn from this one to open better JIRAs going forward. Thanks, Viral On Thu, Jul 4, 2013 at 2:02 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: I think we should

Re: HBASE-7846 : is it safe to use on 0.94.4 ?

2013-07-03 Thread Viral Bajaria
I ended up writing a tool which helps merge the table regions into a target # of regions. For example if you want to go from N -- N/8, then the tool figures out the grouping and merges them in one pass. I will put it up in a github repo soon and share it here. The sad part of this approach is the

Re: HBASE-7846 : is it safe to use on 0.94.4 ?

2013-07-03 Thread Viral Bajaria
Found this while going through the online merge jira... https://issues.apache.org/jira/browse/HBASE-8217 The comments were interesting and I as an user would agree to the fact that supplying a patch is good and it's on me to decide whether I should use it or not. The core committee obviously is

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40

HBASE-7846 : is it safe to use on 0.94.4 ?

2013-07-01 Thread Viral Bajaria
Hi, Just wanted to check if it's safe to use the JIRA mentioned in the subject i.e. https://issues.apache.org/jira/browse/HBASE-7846 Thanks, Viral

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote: Sorry for the typo .. please ignore previous mail.. Here is the corrected one.. 1)I have around 140 columns for each row , out of 140 , around 100 columns hold java primitive data type , remaining 40 columns contain

Re: how can i do compaction manually?

2013-06-30 Thread Viral Bajaria
You can use hbase shell and run major_compact 'tablename' or you could run echo major_compact 'tablename' | hbase shell On Sun, Jun 30, 2013 at 7:51 PM, ch huang justlo...@gmail.com wrote: i want clean the data that is deleted ,question is which command i can execute on commandline? thanks

Re: 答复: flushing + compactions after config change

2013-06-28 Thread Viral Bajaria
On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data in its memstores. That's weird, because I see some of my regions which receive

flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Hi All, I wanted some help on understanding what's going on with my current setup. I updated from config to the following settings: property namehbase.hregion.max.filesize/name value107374182400/value /property property namehbase.hregion.memstore.block.multiplier/name

Re: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks for the quick response Anoop. The current memstore reserved (IIRC) would be 0.35 of total heap right ? The RS total heap is 10231MB, used is at 5000MB. Total number of regions is 217 and there are approx 150 regions with 2 families, ~60 with 1 family and remaining with 3 families. How to

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks Liang! Found the logs. I had gone overboard with my grep's and missed the Too many hlogs line for the regions that I was trying to debug. A few sample log lines: 2013-06-27 07:42:49,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
0.94.4 with plans to upgrade to the latest 0.94 release. On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com wrote: hey Viral, Which hbase version are you using?

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
I do have a heavy write operation going on. Actually heavy is relative. Not all tables/regions are seeing the same amount of writes at the same time. There is definitely a burst of writes that can happen on some regions. In addition to that there are some processing jobs which play catch up and

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks Azuryy. Look forward to it. Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created ? Tried looking around but could not find the details. On Thu, Jun 27, 2013 at 7:53 AM, Azuryy Yu azury...@gmail.com wrote: your JVM options arenot enough. I will give you some detail

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Hey JD, Thanks for the clarification. I also came across a previous thread which sort of talks about a similar problem. http://mail-archives.apache.org/mod_mbox/hbase-user/201204.mbox/%3ccagptdnfwnrsnqv7n3wgje-ichzpx-cxn1tbchgwrpohgcos...@mail.gmail.com%3E I guess my problem is also similar to

Re: NullPointerException when opening a region on new table creation

2013-06-26 Thread Viral Bajaria
and then eventually dropping the table. -Viral On Tue, Jun 25, 2013 at 5:20 PM, Viral Bajaria viral.baja...@gmail.comwrote: Hi JM, Yeah you are right about when the exception happens. I just went through all the logs of table creation and don't see an exception. Though there was a LONG pause when

NullPointerException when opening a region on new table creation

2013-06-25 Thread Viral Bajaria
Hi, I created a new table on my cluster today and hit a weird issue which I have not come across before. I wanted to run it by the list and see if anyone has seen this issue before and if not should I open a JIRA for it. it's still unclear of why it would happen. I create the table

Re: NullPointerException when opening a region on new table creation

2013-06-25 Thread Viral Bajaria
fully created. So it's just normal for HBase to not open it. The issue is on the creation time. Do you sill have the logs? Thanks, JM 2013/6/25 Viral Bajaria viral.baja...@gmail.com: Hi, I created a new table on my cluster today and hit a weird issue which I have not come across before

Re: querying hbase

2013-05-21 Thread Viral Bajaria
The shell allows you to use filters just like the standard HBase API but with jruby syntax. Have you tried that or that is too painful and you want a simpler tool ? -Viral On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote: are there any tools out there that can help in

Re: GET performance degrades over time

2013-05-17 Thread Viral Bajaria
Thanks for all the help in advance! Answers inline.. Hi Viral, some questions: Are you adding new data or deleting data over time? Yes I am continuously adding new data. The puts have not slowed down but that could also be an after effect of deferred log flush. Do you have bloom

Re: GET performance degrades over time

2013-05-17 Thread Viral Bajaria
On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll phobos...@gmail.com wrote: Look at how much Hard Disk utilization you have (IOPS / Svctm). You may just be under scaled for the QPS you desire for both read + write load. If you are performing random gets, you could expect around the low to mid

GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? When you say contention what should I be looking for ? slow operations to respond to data block requests ? or some specific metric in ganglia ? -Viral

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? When I look at top it says: 0K total, 0K used, 0K free (as expected). I can try

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) Thanks for sharing this info. Will remember for future debugging too. Checked the vm.swappiness

Cached an already cached block (HBASE-5285)

2013-05-05 Thread Viral Bajaria
Hi, I have been consistently hitting the following error in one of my QA clusters. I came across two JIRAs, the first one (HBASE-3466) was closed saying Cannot Reproduce but a new one was re-opened under HBASE-5285. I am using HBase 0.94.4 and Hadoop 1.0.4 24 region servers (8 cores, 8GB RAM)

Re: Cached an already cached block (HBASE-5285)

2013-05-05 Thread Viral Bajaria
On Sun, May 5, 2013 at 10:45 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Just to confirm you are getting this with LruBlockCache? If with LruBlockCache then the issue is critical. Because we have faced similar issue with OffHeapCache. But that is not yet stable as far

Re: HBase and Datawarehouse

2013-04-30 Thread Viral Bajaria
On Mon, Apr 29, 2013 at 10:54 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I think for Pheoenix truly to succeed, it's need HBase to break the JVM Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of analytics queries utilize memory, thus since its memory is shared with

Re: max regionserver handler count

2013-04-30 Thread Viral Bajaria
Thanks for getting back, Ted. I totally understand other priorities and will wait for some feedback. I am adding some more info to this post to allow better diagnosing of performance. I hit my region servers with a lot of GET requests (~20K per second per regionserver) using asynchbase in my test

Re: max regionserver handler count

2013-04-30 Thread Viral Bajaria
I am using asynchbase which does not have the notion of batch gets. It allows you to batch at a rowkey level in a single get request. -Viral On Mon, Apr 29, 2013 at 11:29 PM, Anoop John anoop.hb...@gmail.com wrote: You are making use of batch Gets? get(List) -Anoop-

Re: max regionserver handler count

2013-04-30 Thread Viral Bajaria
Looked closely into the async API and there is no way to batch GETs to reduce the # of RPC calls and thus handlers. Will play around tomorrow with the handlers again and see if I can find anything interesting. On Tue, Apr 30, 2013 at 12:03 AM, Anoop John anoop.hb...@gmail.com wrote: If you can

Re: max regionserver handler count

2013-04-29 Thread Viral Bajaria
On Sun, Apr 28, 2013 at 7:37 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: So you mean that when the handler count is more than 5k this happens when it is lesser this does not. Have you repeated this behaviour? What i doubt is when you say bouncing around different

Re: max regionserver handler count

2013-04-29 Thread Viral Bajaria
On Mon, Apr 29, 2013 at 2:25 AM, Ted Yu yuzhih...@gmail.com wrote: I noticed the 8 occurrences of 0x703e... following region server name in the abort message. I wonder why the repetition ? Cheers Oh good observation. I just stepped through the logs again and saw that the client timeout

Re: max regionserver handler count

2013-04-29 Thread Viral Bajaria
On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote: After each zookeeper reconnect, I saw same session Id (0x703e...) What version of zookeeper are you using ? Can you search zookeeper log for this session Id to see what happened ? Thanks Zookeeper version is 3.4.5,

Re: max regionserver handler count

2013-04-29 Thread Viral Bajaria
which had less handlers ( 15K), it stopped bouncing around. I am surprised bumping up handlers and having 0 traffic on the cluster can cause this issue. -Viral On Mon, Apr 29, 2013 at 1:23 PM, Viral Bajaria viral.baja...@gmail.comwrote: On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com

max regionserver handler count

2013-04-28 Thread Viral Bajaria
Hi, I have been trying to play around with the regionserver handler count. What I noticed was, the cluster comes up fine up to a certain point, ~7500 regionserver handler counts. But above that the system refuses to start up. They keep on spinning for a certain point. The ROOT region keeps on

Re: max regionserver handler count

2013-04-28 Thread Viral Bajaria
Yu yuzhih...@gmail.com wrote: bq. the setting is per regionserver (as the name suggests) and not per region right ? That is correct. Can you give us more information about your cluster size, workload, etc ? Thanks On Mon, Apr 29, 2013 at 4:30 AM, Viral Bajaria viral.baja...@gmail.com

Re: Coprocessors

2013-04-25 Thread Viral Bajaria
Phoenix might be able to solve the problem if the keys are structured in the binary format that it understand or else you are better off reloading that data in a table created via Phoenix. But I will let James tackle this question. Regarding your use-case, why can't you do the aggregation using

Re: RefGuide schema design examples

2013-04-19 Thread Viral Bajaria
+1! On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Wow, great work, Doug. 2013/4/19 Doug Meil doug.m...@explorysmedical.com Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus added

Re: schema design: rows vs wide columns

2013-04-07 Thread Viral Bajaria
I think this whole idea of don't go over a certain number of column families was a 2+ year old story. I remember hearing numbers like 5 or 6 (not 3) come up when talking at Hadoop conferences with engineers who were at companies that were heavy HBase users. I agree with Andrew's suggestion that we

Re: Remote Connection To Pseudo Distributed HBase (Deployed in aws ec2) Not Working

2013-04-05 Thread Viral Bajaria
Are you sure that your hbase regionserver is registered with the external IP in zookeeper ? Your client (laptop) might be trying to connect to ec2 hbase using the internal host name which will not get resolved. To do a quick test, just modify the /etc/hosts on your laptop and put both the ec2

Re: HBase Client.

2013-03-20 Thread Viral Bajaria
Most of the clients listed below are language specific, so if your benchmarking scripts are written in JAVA, you are better off running the java client. HBase Shell is more for running something interactive, not sure how you plan to benchmark that. REST is something that you could use, but I can't

Re: Compaction time

2013-03-03 Thread Viral Bajaria
How often do you run those jobs ? Do they run periodically or are they running all the time ? If you have a predictable periodic behavior, you could disable automatic compaction and trigger it manually using a cron job (not the recommended approach, AFAIK). Or you could set the compaction to

Re: Updating from 0.90.2 to 0.94

2013-02-26 Thread Viral Bajaria
Well if you can afford a longer downtime, you can always distcp your existing hbase data. This way if things get screwed up you can always restore a 0.90.x on that old backup. You cannot distcp while the cluster is running since it will not be able to get locks on file (I think I faced that issue

Re: Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above

2013-02-26 Thread Viral Bajaria
Cool !!! This is really good. I have a quick question though, is it possible to use Phoenix over existing tables ? I doubt it but just thought I will ask it on the list. On Tue, Feb 26, 2013 at 11:17 AM, Stack st...@duboce.net wrote: On Tue, Feb 26, 2013 at 10:02 AM, Graeme Wallace

Re: Custom HBase Filter : Error in readFields

2013-02-20 Thread Viral Bajaria
Also the readFields is your implementation of how to read the byte array transferred from the client. So I think there has to be some issue in how you write the byte array to the network and what you are reading out of that i.e. the size of arrays might not be identical. But as Ted mentioned,

Re: PreSplit the table with Long format

2013-02-19 Thread Viral Bajaria
HBase shell is a jruby shell and so you can invoke any java commands from it. For example: import org.apache.hadoop.hbase.util.Bytes Bytes.toLong(Bytes.toBytes(1000)) Not sure if this works as expected since I don't have a terminal in front of me but you could try (assuming the SPLITS keyword

Re: availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread Viral Bajaria
I have come across this too, I think someone with authorization needs to perform a maven release to the apache maven repository and/or maven central. For now, I just end up compiling the dot release from trunk and deploy it to my local repository for other projects to use. Thanks, Viral On Tue,

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread Viral Bajaria
Hi Varun, Are your gets around sequential keys ? If so, you might benefit by doing scans with a start and stop. If they are not sequential I don't think there would be a better way from the way you describe the problem. Besides that, some of the questions that come to mind: - How many GET(s) are

RE: Using HBase for Deduping

2013-02-14 Thread Viral Bajaria
Are all these dupe events expected to be within the same hour or they can happen over multiple hours ? Viral From: Rahul Ravindran Sent: 2/14/2013 11:41 AM To: user@hbase.apache.org Subject: Using HBase for Deduping Hi,    We have events which are delivered into our HDFS cluster which may be

Re: Using HBase for Deduping

2013-02-14 Thread Viral Bajaria
On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com wrote: Most will be in the same hour. Some will be across 3-6 hours. Sent from my phone.Excuse the terseness. On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com wrote: Are all these dupe events expected

Re: Using HBase for Deduping

2013-02-14 Thread Viral Bajaria
is that, doing a lookup per event within the MR job is going to be bad? From: Viral Bajaria viral.baja...@gmail.com To: Rahul Ravindran rahu...@yahoo.com Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, February 14, 2013 12:48 PM Subject: Re: Using

question about pre-splitting regions

2013-02-14 Thread Viral Bajaria
Hi, I am creating a new table and want to pre-split the regions and am seeing some weird behavior. My table is designed as a composite of multiple fixed length byte arrays separated by a control character (for simplicity sake we can say the separator is _underscore_). The prefix of this rowkey

Re: question about pre-splitting regions

2013-02-14 Thread Viral Bajaria
I was able to figure it out. I had to use the createTable api which took splitKeys instead of the startKey, endKey and numPartitions. If anyone comes across this issue and needs more feedback feel free to ping me. Thanks, Viral On Thu, Feb 14, 2013 at 7:30 PM, Viral Bajaria viral.baja

Re: Announcing Phoenix: A SQL layer over HBase

2013-01-30 Thread Viral Bajaria
Congrats guys !!! This is something that was sorely missing in what I am trying to build... will definitely try it out... just out of curiosity, what kind of projects/tools at SalesForce uses this library ? On Wed, Jan 30, 2013 at 5:55 PM, Huanyou Chang mapba...@mapbased.comwrote: Great tool,I

Re: Indexing Hbase Data

2013-01-28 Thread Viral Bajaria
When you say indexing, are you referring to indexing the column qualifiers or the values that you are storing in the qualifier ? Regarding indexing, I remember someone had recommended this on the mailing list before: https://github.com/ykulbak/ihbase/wiki but it seems the development on that is

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
the source repository ? -Viral On Mon, Jan 28, 2013 at 7:43 AM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi viral, Try adding -Psecurity and then compiling. Thanks Vandana Sent from my iPhone On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
and -Dhadoop.profile=23. That should work. Thanks Vandana On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote: Thanks Vandana for reply. I tried that but no luck. It still throws the same error. I thought there might have been a typo and you meant -D and not -P but none of them worked

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
, 2013 at 5:58 PM, Viral Bajaria viral.baja...@gmail.comwrote: Tried all of it, I think I will have to defer this to the hadoop mailing list because it seems there is a missing class in hadoop 0.23 branches, not sure if that is intentional. The class exists in trunk and hadoop 2.0 branches. Though

RE: Hbase Takes Long time to Restart - Whats the correct way to

2013-01-14 Thread Viral Bajaria
restart Hbase cluster? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=0015175cb4cee635fe04d348eb19 --0015175cb4cee635fe04d348eb19 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Is that one unassigned task even getting assigned or it errored out and

Re: DataXceiver java.io.EOFException

2012-11-29 Thread Viral Bajaria
Hi, Is your dfs.datanode.handler.count set to the default value of 3 ? I think I bumped it up when I got these exceptions and the issue wasn't due to xcievers. I would recommend increasing that to 6 and see if the error goes away or the frequency of the error decreases. Thanks, Viral On Wed,