Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Doug Meil
Hi there- re: The max file size the whole cluster can store for one CF is 60G, right? No, the max file-size for a region, in your example, is 60GB. When the data exceeds that the region will split - and then you'll have 2 regions with 60GB limit. Check out this section of the RefGuide:

Re: how to copy oracle to HBASE, just like goldengate

2012-11-02 Thread Doug Meil
Additionally, don't take it for granted that an RDBMS and HBase aren't the same thing. Check out these sections of the RefGuide if you haven't already. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema On 11/1/12 11:01 PM, Shumin Wu shumin...@gmail.com

Re: Development work focused on HFile v2

2012-11-03 Thread Doug Meil
The Hbase RefGuide has a big entry in the appendix on Hfile v2. On 11/3/12 5:34 PM, Marcos Ortiz mlor...@uci.cu wrote: Regards to all HBase users. I'm looking for all available information about the current development of HFile version 2 to write a blog post talking about the main

Re: Paging On HBASE like solr

2012-11-21 Thread Doug Meil
Hi there, Pretty similar approach with Hbase. See the Scan class. http://hbase.apache.org/book.html#data_model_operations On 11/21/12 1:04 PM, Vajrakumar vajra.ku...@pointcross.com wrote: Hello all, As we do paging in solr using start and rowCount I need to implement same through hbase.

Re: Region hot spotting

2012-11-21 Thread Doug Meil
Hi there- If he's using monotonically increasing keys the pre splits won't help because the same region is going to get all the writes. http://hbase.apache.org/book.html#rowkey.design On 11/21/12 12:33 PM, Suraj Varma svarma...@gmail.com wrote: Ajay: Why would you not want to specify

Re: Paging On HBASE like solr

2012-11-22 Thread Doug Meil
. -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: 22 November 2012 00:21 To: user@hbase.apache.org Subject: Re: Paging On HBASE like solr Hi there, Pretty similar approach with Hbase. See the Scan class. http://hbase.apache.org/book.html

Re: Expert suggestion needed to create table in Hbase - Banking

2012-11-26 Thread Doug Meil
Hi there, somebody already wisely mentioned the link to the # of CF's entry, but here are a few other entries that can save you some heartburn if you read them ahead of time. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema

Re: Connecting to standalone HBase from a remote client

2012-11-27 Thread Doug Meil
Hi there- re: From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Yep. That's how it works. Then the client looks up ROOT/META and then the client talks directly to the RegionServers.

Re: Data Locality, HBase? Or Hadoop?

2012-12-03 Thread Doug Meil
Hi there- This is also discussed in the Regions section in the RefGuide: http://hbase.apache.org/book.html#regions.arch 9.7.3. Region-RegionServer Locality On 12/3/12 10:08 AM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, If you have disabled the balancer and are manually moving

Re: Multiple regionservers on a single node

2012-12-03 Thread Doug Meil
Hi there, Not tried multi-RS on a single node, but have you looked at the off-heap cache? It's a part of 0.92.x. From what I understand that feature was designed with this case in mind (I.e., trying to do a lot of caching, but don't want to introduce GC issues in RS).

Re: Reg:delete performance on HBase table

2012-12-05 Thread Doug Meil
Hi there, You probably want to read this section on the RefGuide about deleting from HBase. http://hbase.apache.org/book.html#perf.deleting On 12/5/12 8:31 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Manoj, Delete in HBase is like a put. If you want to delete the entire

Re: CopyTable utility fails on larger tables

2012-12-05 Thread Doug Meil
I agree it shouldn't fail (slow is one thing, fail is something else), but regarding HBase Master Web UI showed only one region for the destination table., you probably want to pre-split your destination table. It's writing to one region, splitting, writing to those regions, splitting, etc.

Re: 答复: 答复: what is the max size for one region and what is the max size of region for one server

2012-12-17 Thread Doug Meil
Hi there, When sizing your data, don't forget to read thisŠ http://hbase.apache.org/book.html#schema.creation and http://hbase.apache.org/book.html#regions.arch 9.7.5.4. KeyValue You need to understand how Hbase stores data internally on initial design to avoid problems down the line. Keep

Re: Is it necessary to set MD5 on rowkey?

2012-12-18 Thread Doug Meil
Hi there- You don't want a filter for this, use a Scan with the lead portion of the key. http://hbase.apache.org/book.html#datamodel See 5.7.3. Scans On a related topic, this is a utility in process to make composite key construction easier. https://issues.apache.org/jira/browse/HBASE-7221

Re: One weird problem of my MR job upon hbase table.

2013-01-07 Thread Doug Meil
Hi there, The HBase RefGuide has a comprehensive case study on such a case. This might not be the exact problem, but the diagnostic approach should help. http://hbase.apache.org/book.html#casestudies.slownode On 1/4/13 10:37 PM, Liu, Raymond raymond@intel.com wrote: Hi I encounter

Re: Constructing rowkeys and HBASE-7221

2013-01-15 Thread Doug Meil
for feedback, but this time aimed at users of HBase: how has your key-building experience been? Thanks! On 1/7/13 11:04 AM, Doug Meil doug.m...@explorysmedical.com wrote: Greetings folks- I would like to restart the conversation on https://issues.apache.org/jira/browse/HBASE-7221 because

Re: Constructing rowkeys and HBASE-7221

2013-01-17 Thread Doug Meil
://jira.kiji.org/browse/schema-3 where we have a design doc that goes into a bit more detail. Cheers, - Aaron On Tue, Jan 15, 2013 at 2:01 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there, well, this request for input fell like a thud. :-) But I think perhaps it has to do with the fact that I

Re: Just joined the user group and have a question

2013-01-17 Thread Doug Meil
Hi there- If you're absolutely new to Hbase, you might want to check out the Hbase refGuide in the architecture, performance, and troubleshooting chapters first. http://hbase.apache.org/book.html In terms of determining why your region servers just die, I think you need to read the background

Re: How to de-nomarlize for this situation in HBASE Table

2013-01-18 Thread Doug Meil
Hi there, I'd recommend reading the Schema Design chapter in the RefGuide because there are some good tips and hard-learned lessons. http://hbase.apache.org/book.html#schema Also, all your examples use composite row keys (not a surprise, a very common pattern) and one thing I would like to

Re: Loading data, hbase slower than Hive?

2013-01-18 Thread Doug Meil
Hi there, See this section of the HBase RefGuide for information about bulk loading. http://hbase.apache.org/book.html#arch.bulk.load On 1/18/13 12:57 PM, praveenesh kumar praveen...@gmail.com wrote: Hey, Can someone throw some pointers on what would be the best practice for bulk imports

Re: Reagrding HBase Hadoop multiple scan objects issue

2013-01-18 Thread Doug Meil
Hi there- You probably want to review this section of the RegGuide: http://hbase.apache.org/book.html#mapreduce re: it's inefficient to have one scan object to scan everything. It is. But in the MapReduce case, there is a Map-task for each input split (see the RefGuide for details), and

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Doug Meil
Hi there- On top of what everybody else said, for more info on rowkey design and pre-splitting see http://hbase.apache.org/book.html#schema (as well as other threads in this dist-list on that topic). On 1/19/13 4:12 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Austin, I am

Re: Join Using MapReduce and Hbase

2013-01-24 Thread Doug Meil
Hi there- Here is a comment in the RefGuide on joins in the HBase data model. http://hbase.apache.org/book.html#joins Short answer, you need to do it yourself (e.g., either with an in-memory hashmap or instantiating an HTable of the other table, depending on your situation). For other MR

Re: question about pre-splitting regions

2013-02-15 Thread Doug Meil
Good to hear! Given your experience, I'd appreciate your feedback on the section 6.3.6. Relationship Between RowKeys and Region Splits in... http://hbase.apache.org/book.html#schema.creation Š because it's on that same topic. Any other points to add to this? Thanks! On 2/14/13 11:08 PM,

Re: HBase type support

2013-03-18 Thread Doug Meil
Sorry I'm late to this thread but I was the guy behind HBASE-7221 and the algorithms specifically mentioned were MD5 and Murmur (not SHA-1). And implementation of Murmur already exists in Hbase, and the MD5 implementation was the one that ships with Java. The intent was to include hashing

Re: HBase Types: Explicit Null Support

2013-04-01 Thread Doug Meil
HmmmŠ good question. I think that fixed width support is important for a great many rowkey constructs cases, so I'd rather see something like losing MIN_VALUE and keeping fixed width. On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote: Heya, Thinking about data types and

Re: schema design: rows vs wide columns

2013-04-08 Thread Doug Meil
For the record, the refGuide mentions potential issues of CF lumpiness that you mentioned: http://hbase.apache.org/book.html#number.of.cfs 6.2.1. Cardinality of ColumnFamilies Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If

Re: ANN: HBase Refcard available

2013-04-09 Thread Doug Meil
You beat me to it! :-) I just realized that right when I hit enter on my previous email. On 4/9/13 2:05 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Stack (cleaning your inbox? ;)) Looks like Doug did it a while back - https://issues.apache.org/jira/browse/HBASE-6574 ?

Re: 答复: HBase random read performance

2013-04-15 Thread Doug Meil
Hi there, regarding this... We are passing random 1 row-keys as input, while HBase is taking around 17 secs to return 1 records. …. Given that you are generating 10,000 random keys, your multi-get is very likely hitting all 5 nodes of your cluster. Historically, multi-Get used to

RefGuide schema design examples

2013-04-19 Thread Doug Meil
Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus added several cases common on the dist-list. http://hbase.apache.org/book.html#schema.casestudies Comments/suggestions welcome. Thanks! Doug Meil Chief Software Architect, Explorys

Re: RefGuide schema design examples

2013-04-21 Thread Doug Meil
, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Wow, great work, Doug. 2013/4/19 Doug Meil doug.m...@explorysmedical.com Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus

RE: Row Key Question

2011-02-16 Thread Doug Meil
Hi there- As was described in the HBase chapter in the Hadoop book by Tom White, you don't want to insert a lot of data at one time with incrementing keys. -MM-DD would seem to me to be a reasonable lead-portion of a key - as long as you aren't trying to insert everything in time-order

RE: JobControl and HBase MR chaining

2011-03-22 Thread Doug Meil
You need to execute two Jobs serially that use TableMapper in a thread. Can't use JobControl. -Original Message- From: Vishal Kapoor [mailto:vishal.kapoor...@gmail.com] Sent: Tuesday, March 22, 2011 1:34 PM To: user@hbase.apache.org Subject: JobControl and HBase MR chaining with

RE: Observer/Observable MapReduce

2011-03-25 Thread Doug Meil
The simplest way to do this is with a thread that executes the jobs you want to run synchronously Job job1 = ... job1.waitForCompletion(true); Job job2 = ... job2.waitForCompletion(true); -Original Message- From: Vishal Kapoor

HBase wiki updated

2011-04-02 Thread Doug Meil
Hi there everybody- Just thought I'd let everybody know about this... Stack and I have been working on updating the HBase book and porting portions of the very-out-date HBase wiki to the HBase book. These two pages... http://wiki.apache.org/hadoop/Hbase/DesignOverview

RE: hbase

2011-04-10 Thread Doug Meil
(changing to user hbase distlist instead of general hadoop) Hive runs as MapReduce jobs. If you are looking for quick (i.e., non-MapReduce) access to tables, you need to use the HBase client (Get, Scan). http://hbase.apache.org/book.html#datamodel -Original Message- From: Mag Gam

RE: too many regions cause OME ?

2011-04-11 Thread Doug Meil
Re: maxHeap=3991 Seems like an awful lot of data to put in a 4gb heap. -Original Message- From: 陈加俊 [mailto:cjjvict...@gmail.com] Sent: Monday, April 11, 2011 8:35 PM To: hbase-u...@hadoop.apache.org Subject: too many regions cause OME ? Is it too many regions ? Is the memory

RE: HBase is not ready for Primetime

2011-04-13 Thread Doug Meil
Hi there- For what it's worth, although we haven't had this particular issue we've certainly had other bumps and bruises (GC of death, and other metadata issues caused when a split dies during a GC of death, etc.). But there are few general items that helped in stability and performance I

RE: HBase is not ready for Primetime

2011-04-13 Thread Doug Meil
Context: we're still on .89 - so we can't take advantage of the MemStore allocation buffers yet. One of the most important metrics for us was GC-stuck region servers, and more nodes + more memory + scheduling periodic cluster restarts helped in our situation. I wholeheartedly agree with the

RE: heap memory allocation

2011-04-15 Thread Doug Meil
The HBase book has come through some updates recently on these metrics. http://hbase.apache.org/book.html#hbase_metrics -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Friday, April 15, 2011 5:04 AM To: user@hbase.apache.org Cc: 陈加俊;

RE: Should I be afraid by 'put','get'...

2011-04-25 Thread Doug Meil
Hi there- Review the HBase book too. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#client http://hbase.apache.org/book.html#performance -Original Message- From: JohnJohnGa [mailto:johnjoh...@gmail.com] Sent: Sunday, April 24, 2011 2:46 AM To:

RE: Row count without iterating over ResultScanner?

2011-05-01 Thread Doug Meil
experiment more with this option. Right now I want to stay away from MR, mainly because of cluster warm-up time, and I want to get results almost real-time (few seconds max). Thanks for the tip on caching! On 01.05.2011 19:55, Doug Meil wrote: What caching value are you using on the scan? If you

RS data-transfer metric suggestion...

2011-05-09 Thread Doug Meil
Hi everybody- There I just thought I'd ping the group to see what everybody thought about this RS metric suggestion... https://issues.apache.org/jira/browse/HBASE-3869 Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

RE: just trying to get into HBase from java

2011-05-09 Thread Doug Meil
Read this section in the HBase book... http://hbase.apache.org/book.html#d427e2108 Java client configuration -Original Message- From: James McGlothlin [mailto:mcglothli...@gmail.com] Sent: Monday, May 09, 2011 4:57 PM To: hbase-u...@hadoop.apache.org Subject: just trying to get into

RE: HBase book/Wiki update - 5-12-2011

2011-05-12 Thread Doug Meil
Hi folks, the wiki Troubleshooting page is now obsolete. It's accessible via the obsolete pages link. Stack just updated the website with the updates to the Troubleshooting section. Enjoy! -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: Friday, May

RE: major compaction best practice

2011-05-16 Thread Doug Meil
For starters, take a look at this... http://hbase.apache.org/book.html#perf.configurations -Original Message- From: Oleg Ruchovets [mailto:oruchov...@gmail.com] Sent: Monday, May 16, 2011 6:42 AM To: user@hbase.apache.org Subject: major compaction best practice Hi , We running

RE: number of column families

2011-05-16 Thread Doug Meil
It's currently bad in general. -Original Message- From: Lars Egarots [mailto:lars.egar...@yahoo.com] Sent: Monday, May 16, 2011 12:36 PM To: user@hbase.apache.org Subject: number of column families The user documentation, in the Apache HBase book, states: HBase currently does not do

RE: HBase Scability

2011-05-18 Thread Doug Meil
Hi there- Re: When I started inserting data in the tables it seems that they are always inserting in a single region, You probably want to read this as a general warning... http://hbase.apache.org/book.html#timeseries .. and check this out as a potential solution for bucketing timeseries

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Re: The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table. With multi-get in .90.x you could perform some reasonably clever processing and not do the lookups one-by-one but in

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Eran's observation was that a join is solvable in a Mapper via lookups on a 2nd HBase table, but it might not be that efficient if the lookups are 1 by 1. I agree with that. My suggestion was to use multi-Get for the lookups instead. So you'd hold onto a batch of records in the Mapper and

RE: Timeouts on gets and puts

2011-06-02 Thread Doug Meil
1) You probably want to upgrade to a more recent version of HBase 2) You probably want to read this: http://hbase.apache.org/book.html#performance Not knowing anything about your cluster size or table design, Put delays can be exacerbated by a number of things. 3) As for the time limit, I

RE: Question from HBase book: HBase currently does not do well with anything about two or three column families

2011-06-02 Thread Doug Meil
Re: Is that still considered current? Do folks on the list generally agree with that guideline? Yes and yes. HBase runs better with fewer CFs. -Original Message- From: Leif Wickland [mailto:leifwickl...@gmail.com] Sent: Thursday, June 02, 2011 5:41 PM To: user@hbase.apache.org

RE: using HBase to read from xml files in HDFS

2011-06-03 Thread Doug Meil
Per... http://hbase.apache.org/book.html#supported.datatypes ... you could store an XML file in HBase as a single value, but the responsibility of parsing the XML an interpreting it is entirely yours. -Original Message- From: James Ram [mailto:hbas...@gmail.com] Sent: Friday, June 03,

RE: performance monitoring question

2011-06-05 Thread Doug Meil
You probably want to read this... http://hbase.apache.org/book.html#performance -Original Message- From: Hiller, Dean x66079 [mailto:dean.hil...@broadridge.com] Sent: Sunday, June 05, 2011 7:22 PM To: hbase-u...@hadoop.apache.org Subject: performance monitoring question So, we were

RE: How to efficiently join HBase tables?

2011-06-06 Thread Doug Meil
Re: So, you all realize the joins have been talked about in the database community for 40 years? Great point. What's old is new!:-) My suggested from earlier in the thread was a variant of nested loops by using multi-get in HTable, which would reduce the number of RPC calls. So it's a

RE: full table scan

2011-06-06 Thread Doug Meil
Check the web console. -Original Message- From: Andre Reiter [mailto:a.rei...@web.de] Sent: Monday, June 06, 2011 5:27 PM To: user@hbase.apache.org Subject: Re: full table scan good question... i have no idea... i did not define explicitly the number of regions for the table, how can

RE: HBase Get issue

2011-06-07 Thread Doug Meil
Are you experiencing this? https://issues.apache.org/jira/browse/HBASE-3686 One our guys found and fixed this a while back. This was found with Scans, but since a Gets are implemented as Scans on the RS I thought this might be relevant. -Original Message- From: Zhenyu Zhong

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
Re: With respect to Doug's posts, you can't do a multi-get off the bat That's an assumption, but you're entitled to your opinion. -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Monday, June 06, 2011 10:08 PM To: user@hbase.apache.org Subject: RE: How to

RE: distribution of regions to servers

2011-06-08 Thread Doug Meil
If I understand the history correctly, round-robin was used in .89, but retains is the policy for .90+. My 2-cents is that if/when region-shuffling is required, I'd rather do that with another utility and keep that out of cluster startup. -Original Message- From: saint@gmail.com

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
Segel On Jun 8, 2011, at 8:01 AM, Doug Meil doug.m...@explorysmedical.com wrote: Re: With respect to Doug's posts, you can't do a multi-get off the bat That's an assumption, but you're entitled to your opinion. -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com

RE: Hbase rowkey question ?

2011-06-10 Thread Doug Meil
Praveenesh, in addition to what Joey already said in another response to your question, see these chapters in the HBase book. http://hbase.apache.org/book.html#schema http://hbase.apache.org/book.html#datamodel -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com]

RE: Question from HBase book: HBase currently does not do well with anything about two or three column families

2011-06-13 Thread Doug Meil
Re: keyed by something like [timestamp, action details, session ID] Read the part about monotonically increasing keys in the HBase book. There have been lots of other threads in the dist-list about this topic too. -Original Message- From: Leif Wickland

RE: Question from HBase book: HBase currently does not do well with anything about two or three column families

2011-06-13 Thread Doug Meil
Re: monotonically increasing column names. No problem with that. -Original Message- From: Leif Wickland [mailto:leifwickl...@gmail.com] Sent: Monday, June 13, 2011 5:29 PM To: user@hbase.apache.org Subject: Re: Question from HBase book: HBase currently does not do well with

RE: Incoming Row Distribution Strategy/Algorithm Among Region Servers?

2011-06-15 Thread Doug Meil
This is briefly covered in the client architecture overview... http://hbase.apache.org/book.html#client ... the gist is that as David describes the client talks directly to the RegionServers, and knows the start/end keys available. -Original Message- From: Buttler, David

RE: Incoming Row Distribution Strategy/Algorithm Among Region Servers?

2011-06-15 Thread Doug Meil
ahead of time if you have a good handle on the data distribution. -chris On Jun 15, 2011, at 10:47 AM, Shuja Rehman wrote: yeah, i understand this but my question was that who will define the start and stop key of a region server? did u get my point? On Wed, Jun 15, 2011 at 9:53 PM, Doug

RE: random versus ordered input strategies

2011-06-15 Thread Doug Meil
Assuming that this approach was taken to create new regions instead of splitting, since this is assuming monotonically increasing keys you would always have one hot region that would be receiving all of the new data. That would limit load throughput. -Original Message- From:

RE: Problem of output

2011-06-16 Thread Doug Meil
Hi there- The rowids and values are hex. You probably want to look at the hbase book... http://hbase.apache.org/book.html#datamodel ... because eventually you'll want to do something with this data other than see it in the shell. -Original Message- From: hbaser

RE: Add a column family to a table

2011-06-17 Thread Doug Meil
You need to go through HBaseAdmin. http://hbase.apache.org/book.html#schema.creation Disable the table, add the CF, then re-enable the table. -Original Message- From: Eranda Sooriyabandara [mailto:0704...@gmail.com] Sent: Friday, June 17, 2011 2:11 PM To: user@hbase.apache.org

RE: hbase architecture question

2011-06-18 Thread Doug Meil
Hi there- There is a section in the hbase book on pre-creating regions. http://hbase.apache.org/book.html#precreate.regions -Original Message- From: Hiller, Dean x66079 [mailto:dean.hil...@broadridge.com] Sent: Friday, June 17, 2011 4:28 PM To: user@hbase.apache.org Subject: RE: hbase

RE: Insert a lot of data in HBase

2011-06-20 Thread Doug Meil
Look here in the HBase book for these, and other, tips. http://hbase.apache.org/book.html#performance -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Monday, June 20, 2011 2:03 PM To: user@hbase.apache.org Subject: Re:

RE: Insert a lot of data in HBase

2011-06-20 Thread Doug Meil
Just be aware that the data in the write-buffer is in the client - so it hasn't been sent to the RegionServers yet. So as long as your client doesn't die, you should be ok. -Original Message- From: saurabh@gmail.com [mailto:saurabh@gmail.com] On Behalf Of Sam Seigal Sent:

RE: TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread Doug Meil
TableOutputFormat also does this... table.setAutoFlush(false); Check out the HBase book for how the writebuffer works with the HBase client. http://hbase.apache.org/book.html#client -Original Message- From: edward choi [mailto:mp2...@gmail.com] Sent: Tuesday, June 21, 2011 10:23

HBase book updates

2011-06-22 Thread Doug Meil
/Troubleshooting chapters. Additionally, there are more entries in Troubleshooting plus a few other code examples sprinkled in. If anybody has difficulty finding anything, just email the dist-list. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

RE: HBase book updates

2011-06-22 Thread Doug Meil
It's already deployed. -Original Message- From: Zhong, Andy [mailto:sheng.zh...@searshc.com] Sent: Wednesday, June 22, 2011 10:54 AM To: user@hbase.apache.org; hbase-u...@hadoop.apache.org; Doug Meil Subject: RE: HBase book updates Doug, could you share them? Thanks, -Andy Zhong

RE: HBase doesn't work....

2011-06-22 Thread Doug Meil
Hi there- You probably want to give this a read ... http://hbase.apache.org/book.html#architecture ... most people run Master on the NameNode. -Original Message- From: Laurent Hatier [mailto:laurent.hat...@gmail.com] Sent: Wednesday, June 22, 2011 5:14 PM To: user@hbase.apache.org

RE: Running MapReduce from a web application

2011-06-24 Thread Doug Meil
Hi there- Take a look at this for starters... http://hbase.apache.org/book.html#mapreduce if you do job.waitForCompletion(true); it will execute synchronously. If you do job.waitForCompletion(false) it will fire and forget. A simple pattern is to spin off a thread where it executes

RE: Obtain many mappers (or regions)

2011-06-27 Thread Doug Meil
Hi there- If you only have 100 rows I think that HBase might be overkill. You probably want to start with this to get a background on what HBase can do... http://hbase.apache.org/book.html .. there is a section on MapReduce with HBase as well. -Original Message- From: Florin P

Re: Add a column family to a table

2011-06-30 Thread Doug Meil
, 2011 at 23:54, Doug Meil doug.m...@explorysmedical.comwrote: No problem! I added a patch to the book for this case. It probably could have been a little more obvious. -Original Message- From: Eranda Sooriyabandara [mailto:0704...@gmail.com] Sent: Friday, June 17, 2011 3:54 PM

Re: How to config gc policy when server machine memory is greater than 64G

2011-06-30 Thread Doug Meil
I'd start with the Hbase book http://hbase.apache.org/book.html#gc On 6/29/11 10:50 PM, xiujin yang xiujiny...@hotmail.com wrote: Hi all Backgrand: Hadoop : CDH3u0 HBase : CDH3u0 ZK : CDH3u0 Servers: 30 Now our hbase server is more than 64G. and we want to use hbase on online

Re: question in retrieving data from hbase

2012-03-10 Thread Doug Meil
Hi there- There is a chapter in the Hbase RefGuide on the Hbase data model that might be helpful. http://hbase.apache.org/book.html#datamodel On 3/10/12 1:30 AM, newbie24 shripri...@hotmail.com wrote: Thanks Harsh..little confused ..want to clarify some more the row key i have is a

Re: Memory Requirements

2012-03-10 Thread Doug Meil
Hi there- Here are the recommendations from the HBase RefGuide: http://hbase.apache.org/book.html#perf.os ... and they are consistently with what the book says (recommends 64-bit OS and more memory). Also, keep this in mind... http://hbase.apache.org/book.html#arch.overview ... the

Re: hbase performance issue

2012-03-11 Thread Doug Meil
If you're using Cloudera, you want to be on CDH3u3 because it has several HDFS performance fixes for low-latency reads. That still doesn't address your 23:00-hour perf issue, but that's something that will help. On 3/11/12 3:39 PM, Антон Лыска ant...@wildec.com wrote: Hi guys! I have a

Re: example of mapreduce output to hbase

2012-03-11 Thread Doug Meil
Hi there- Have you seen the examples in here? http://hbase.apache.org/book.html#mapreduce On 3/11/12 4:59 PM, Weishung Chung weish...@gmail.com wrote: Hey users, I am trying to store mapreduce output directly to HBase. Basically I have a regular mapper reading from files and would like

Re: Retrieve Column Family and Column with Java API

2012-03-12 Thread Doug Meil
Hi there- You probably want to see this... http://hbase.apache.org/book.html#dm.column.metadata You can get the CF's from HTableDescriptor. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.ht ml On 3/12/12 10:10 AM, Mahdi Negahi negahi.ma...@hotmail.com wrote:

Re: partial scanning

2012-03-14 Thread Doug Meil
Scans are also described in the RefGuide here... http://hbase.apache.org/book.html#data_model_operations On 3/14/12 2:22 AM, Akbar Gadhiya akbar.gadh...@gmail.com wrote: Hi, You can perform scan this way, scan 'tablename', {STARTROW='name + start time stamp', ENDROW='name + end time

Re: HBase rowkey RDBMS PK

2012-03-14 Thread Doug Meil
Hi there- You probably want to see this in the RefGuide... http://hbase.apache.org/book.html#schema On 3/14/12 9:47 PM, 韶隆吴 yechen1...@gmail.com wrote: Hi all: I'm trying to import data from oracle to hbase and now I have a problem. In some tables,there have more than one primary

Re: Scan.addFamiliy reduces results

2012-03-15 Thread Doug Meil
re: However, I am getting different number of results, depending on which families are added Yes. I'd suggest you read this in the RefGuide. http://hbase.apache.org/book.html#datamodel On 3/15/12 12:08 PM, Peter Wolf opus...@gmail.com wrote: Hi all, I am doing a scan on a table with

Re: Confirming a Bug

2012-03-23 Thread Doug Meil
Speculative execution is on by default. http://hbase.apache.org/book.html#mapreduce.specex On 3/23/12 8:04 AM, Peter Wolf opus...@gmail.com wrote: Hi Michel, I agree it doesn't make sense, but then I believe we are tracking a bug. I don't know about speculative execution, but I certainly

HBase RefGuide updated

2012-03-28 Thread Doug Meil
Hi folks- The HBase RefGuide has been updated on the website. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: HBase RefGuide updated

2012-03-28 Thread Doug Meil
each of those links is a separate study in this new chapter. From: Doug Meil doug.m...@explorysmedical.commailto:doug.m...@explorysmedical.com Date: Wed, 28 Mar 2012 17:50:01 -0400 To: user@hbase.apache.orgmailto:user@hbase.apache.org user@hbase.apache.orgmailto:user@hbase.apache.org Subject

Re: 0.92 and Read/writes not scaling

2012-03-30 Thread Doug Meil
Just as a quick reminder regarding what Todd mentioned, that's exactly what was happening in this case study... http://hbase.apache.org/book.html#casestudies.slownode ... although it doesn't appear to be the problem in this particular situation. On 3/29/12 8:22 PM, Juhani Connolly

Re: HBase bulk loader doing speculative execution when it set to false in mapred-site.xml

2012-03-30 Thread Doug Meil
Speculative execution is on by default in Hadoop. One of the Performance recommendations in the Hbase RefGuide is to turn it off. On 3/30/12 6:12 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Well that's not an HBase configuration, that's Hadoop. I'm not sure if this is listed

Re: HBase database sample

2012-04-02 Thread Doug Meil
See the link to the BigTable paper here... http://hbase.apache.org/book.html#other.info ... and there is other reading material and videos too. On 4/1/12 11:30 PM, Mahdi Negahi negahi.ma...@hotmail.com wrote: thanks, but all databases have good examples , like Cinema in Neo4j and etc. but

Re: HBase database sample

2012-04-02 Thread Doug Meil
Also, see this chapter. http://hbase.apache.org/book.html#schema On 4/2/12 11:40 AM, Doug Meil doug.m...@explorysmedical.com wrote: See the link to the BigTable paper here... http://hbase.apache.org/book.html#other.info ... and there is other reading material and videos too. On 4/1

Re: HBaseCon Where is the information?

2012-04-02 Thread Doug Meil
HBaseCon is also on the home page... http://hbase.apache.org/ On 4/2/12 3:18 PM, Lars George lars.geo...@gmail.com wrote: http://www.hbasecon.com/ On Apr 2, 2012, at 10:16 PM, Marcos Ortiz wrote: I heard yesterday that the first conference dedicated to HBase will be in the next days.

Re: How to debug and run hadoop/HBase source code in eclipse

2012-04-04 Thread Doug Meil
Hi there- See... http://hbase.apache.org/book.html#developer On 4/4/12 8:02 AM, Asmi smita.j...@gmail.com wrote: Hi, May I know is there any book just like you suggested for HBase to make the changes. Asmi.

Re: hbase map/reduce questions

2012-04-04 Thread Doug Meil
Hi there, you probably want to see this.. http://hbase.apache.org/book.html#splitter ... as well as this... http://hbase.apache.org/book.html#regions.arch.locality ... as the latter describes data locality. On 4/4/12 7:41 AM, sdnetwork sdnetw...@gmail.com wrote: Hello, I started

Re: hbase map/reduce questions

2012-04-04 Thread Doug Meil
The default behavior is that the input splits are where the data is stored. On 4/4/12 5:24 PM, sdnetwork sdnetw...@gmail.com wrote: ok thanks, but i don't find the information that tell me how the result of the split is distrubuted across the different node of the cluster ? 1) randomely ?

Re: hbase map/reduce questions

2012-04-05 Thread Doug Meil
. with the default behavior only two nodes will work for a map/reduce task., isn't it ? if i do a custom input that split the table by 100 rows, can i distribute manually each part on a node regardless where the data is ? Le 5 avril 2012 00:36, Doug Meil doug.m...@explorysmedical.com a écrit : The default

Re: Is HBase Thread-Safety?

2012-04-12 Thread Doug Meil
re: Is HBase thread-safety? HTable instances are not thread safe, though. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html On 4/12/12 6:10 PM, Bing Li lbl...@gmail.com wrote: Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when

  1   2   3   4   >