Re: RDBMS to HBASE schema migration

2012-06-28 Thread Doug Meil
Hi there- I commend your enthusiasm for the Hbase project. For the ground rules of Hbase you probably want to read this closelyŠ http://hbase.apache.org/book.html#datamodel Š as it covers things like having one PK per table, no secondary indexes, etc. With a solid understanding of these

Re: Timestamp as a key good practice?

2012-06-14 Thread Doug Meil
Will do! On 6/14/12 2:06 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: JM, have a look at https://github.com/sematext/HBaseWD (this comes up often Doug, maybe you could add it to the Ref Guide?) Otis Performance Monitoring for Solr / ElasticSearch / HBase -

Re: HBase first steps: Design a table

2012-06-13 Thread Doug Meil
Just wanted to point out that is also discussed under the autoFlush entry in this chapter.. http://hbase.apache.org/book.html#perf.writing .. but I think this could be better highlighted. I will fix it. On 6/13/12 10:25 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi N. The

Re: The write process in the Region Server

2012-06-11 Thread Doug Meil
Hi there- Your understanding is on track. You probably want to read this section.. http://hbase.apache.org/book.html#regions.arch Š as it covers those topics in more detail. On 6/10/12 1:02 PM, Amit Sela am...@infolinks.com wrote: Hi all, I'm trying to better understand what's going on

Re: hbase lookups in file, source location

2012-06-06 Thread Doug Meil
Have you looked at any of the source code? On 6/6/12 10:35 AM, S Ahmed sahmed1...@gmail.com wrote: From what I understand each column family has its own file(s). When hbase lookups a cell in a column family, how does it perform the lookup? Is there an index on the file, or is it ordered?

Re: how does hbase get the latest version with immutable hfiles?

2012-06-02 Thread Doug Meil
Hi there, I think you probably want to look at thisŠ Hbase catalog metadataŠ http://hbase.apache.org/book.html#arch.catalog How data is stored internallyŠ http://hbase.apache.org/book.html#regions.arch Lots of versioning description hereŠ http://hbase.apache.org/book.html#datamodel Long

Re: When does compaction actually occur?

2012-06-02 Thread Doug Meil
Related to when does compaction actually occur?, although the original question was about the web UI you might also want to see this... http://hbase.apache.org/book.html#regions.arch Š for an overview of the compaction file-selection algorithm. On 6/2/12 8:42 AM, lars hofhansl

Re: RefGuide updated

2012-05-30 Thread Doug Meil
...@gmail.comwrote: Thanks Doug, very instructive. Do you have number to feel the gain using bulk loading On May 24, 2012 4:18 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi folks- The RefGuide was updated in a big way at the Hackathon yesterday. Two things to note: http

Re: A question about HBase MapReduce

2012-05-25 Thread Doug Meil
re: data from raw data file into hbase table One approach is bulk loading.. http://hbase.apache.org/book.html#arch.bulk.load If he's talking about using an Hbase table as the source of a MR job, then see this... http://hbase.apache.org/book.html#splitter On 5/25/12 2:35 AM, Florin P

Re: Pagination in hbase

2012-05-24 Thread Doug Meil
Hi there- This tooŠ http://hbase.apache.org/book.html#scan Š you can do what you want by using the start/stop rows. On 5/24/12 8:54 AM, Srikanth P. Shreenivas srikanth_shreeni...@mindtree.com wrote: Look at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#s

RefGuide updated

2012-05-24 Thread Doug Meil
in the RefGuide. http://hbase.apache.org/book.html#hbck.in.depth I ported a big writeup on hbck to the RefGuide appendix that Jon H. wrote. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: RefGuide updated

2012-05-24 Thread Doug Meil
Also, 2 weeks ago the pseudo-distributed config and extras standalone web page was moved and refactored into the Configuration chapter of the RefGuide. That came up in conversation yesterday at the hackathon and I wanted to point that out. On 5/24/12 9:17 AM, Doug Meil doug.m

Re: Get the list of store/store files for a region via HBase API

2012-05-16 Thread Doug Meil
) for use cases that need more complicated meta information. 2. If a region is removed from meta store, there exists a timing between the time it is removed from meta store and that it is physically removed from HDFS. That said, it is not reliable. Chen On Tue, May 15, 2012 at 10:06 PM, Doug Meil

Re: Get the list of store/store files for a region via HBase API

2012-05-15 Thread Doug Meil
You can get the Table-Region-StoreFile information via HDFS. That is described here in the RefGuide: http://hbase.apache.org/book.html#trouble.namenode On 5/15/12 5:09 PM, Chen Song chen.song...@gmail.com wrote: I am new to HBase and started working on a project which needs meta

Re: Get the list of store/store files for a region via HBase API

2012-05-15 Thread Doug Meil
information is within HBase world. Thanks Chen On Tue, May 15, 2012 at 5:23 PM, Doug Meil doug.m...@explorysmedical.comwrote: You can get the Table-Region-StoreFile information via HDFS. That is described here in the RefGuide: http://hbase.apache.org/book.html#trouble.namenode On 5/15/12 5:09

Re: Could an EC2 machine to 4 times slower than local dev workstation?

2012-05-15 Thread Doug Meil
For the record, what Andrew/Li said is pretty much the standard disclaimer in the Performance chapter for EC2. It's a separate class of performance problem. http://hbase.apache.org/book.html#perf.ec2 On 5/15/12 8:04 PM, Andrew Purtell apurt...@apache.org wrote: It's not just a matter of

Re: Switching existing table to Snappy possible?

2012-05-09 Thread Doug Meil
I'll update the RefGuide with this. This is a good thing for everybody to know. On 5/9/12 5:08 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Just alter the families, the old store files will get converted during compaction later on. J-D On Wed, May 9, 2012 at 2:06 PM, Otis

Re: Looking for a single row - HTable.get(Get) or Scan(Get)

2012-05-09 Thread Doug Meil
Also, there is multi-Get support as of 0.90.x to further optimize the RPC calls if you need to make a bunch of calls. On 5/9/12 4:47 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: What Bryan said, also Scan(Get) is used internally in the region server code so that's probably why that

Re: Sequential read-update HFile

2012-05-08 Thread Doug Meil
Once a StoreFile is written to disk it is never updated. http://hbase.apache.org/book.html#regions.arch So the options are are either Puts (like you've been describing - which will create new StoreFiles per MemStore flush) or bulk loading HFiles (new StoreFiles). Out of curiosity, are you

Re: HBase HDFS disk space usage

2012-05-07 Thread Doug Meil
You're right, it's not currently a metric. But there is an entry for the disk usage here... http://hbase.apache.org/book.html#trouble.namenode On 5/6/12 10:41 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, Does HBase know how much space it is occupying on HDFS? I looked at

Re: Hbase CopyTable timeout on scanner

2012-05-07 Thread Doug Meil
One thing you might want to watch out for is that if you are starting with 421 regions on the source but the dest table isn't pre-split then it's going to try to slam all the data into one region and then have to split (and split and split, etc.). http://hbase.apache.org/book.html#perf.writing

Re: terrible! I can't drop the table

2012-05-07 Thread Doug Meil
Harsh pretty much summed it up already (e.g., don't do that) but below is some further reading of what just happened... http://hbase.apache.org/book.html#arch.catalog http://hbase.apache.org/book.html#trouble.namenode ... META is just an HBase table under the covers. By deleting the table on

Re: terrible! I can't drop the table

2012-05-07 Thread Doug Meil
it was ok to delete everything... In your case, you may have to figure out how to export some data first, as I don't know exactly what effect deleting that temp directory will have) Good luck! --Tom On Monday, May 7, 2012, Doug Meil wrote: Harsh pretty much summed it up already (e.g

Re: terrible! I can't drop the table

2012-05-07 Thread Doug Meil
with no data ? On 7 May 2012 22:19, Doug Meil doug.m...@explorysmedical.com wrote: Because you did this... hadoop fs -rmr /hbase/cjjWaitHash ... your data is gone. Per... http://hbase.apache.org/book.html#trouble.namenode ... that's where StoreFiles are kept for that particular table. On 5/7

Re: HBase HDFS disk space usage

2012-05-07 Thread Doug Meil
a JIRA issue for this? Thanks, Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Doug Meil doug.m...@explorysmedical.com To: user@hbase.apache.org user@hbase.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com

Re: region size

2012-05-02 Thread Doug Meil
re: with lackluster performance for random reads You want to be on CDH3u3 for sure if you want to boost random read performance. On 5/2/12 5:29 PM, Paul Mackles pmack...@adobe.com wrote: I think the answer to this is no, but I am hoping someone with more experience can confirm thisŠ we are

Re: Best Hbase Storage for PIG

2012-04-26 Thread Doug Meil
Hi there, as a sanity check with respect to writing have you double-checked this section of the RefGuide.. http://hbase.apache.org/book.html#perf.writing ... regarding pre-created regions and monotonically increasing keys? Also as a sanity check refer to this case study as a diagnostic

Re: Are minor compaction and major compaction different in HBase 0.92?

2012-04-26 Thread Doug Meil
There is also a description of the compaction file-selection algorithm in here... http://hbase.apache.org/book.html#regions.arch (section 8.7.5.5) On 4/26/12 5:28 PM, Robby robby.verkuy...@gmail.com wrote: Yes, as per Lars' book: Minor compactions can be promoted to major compactions if

Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

2012-04-25 Thread Doug Meil
Hi there- In addition to what was said about GC, you might want to double-check this... http://hbase.apache.org/book.html#performance ... as well as this case-study for performance troubleshooting http://hbase.apache.org/book.html#casestudies.perftroub On 4/24/12 9:58 PM, Michael Segel

Re: Storing extremely large size file

2012-04-18 Thread Doug Meil
This would be a good entry for the new Use Cases chapter in the Schema Design section. On 4/18/12 1:02 AM, lars hofhansl lhofha...@yahoo.com wrote: I disagree. This comes up frequently and some basic guidelines should be documented in the Reference Guide. If it is indeed not difficult than

Re: Basic -Hbase table question

2012-04-18 Thread Doug Meil
Hi there- Because your topic is webcrawling, you might want to read the BigTable paper because the example in that paper is about webcrawling. You can find that, and other info, in the RefGuide... http://hbase.apache.org/book.html#other.info.papers On 4/18/12 2:08 PM, petri koski

Re: Hbase Map/reduce-How to access individual columns of the table?

2012-04-17 Thread Doug Meil
Hi there- Have you seen the chapter on MR in the RefGuide? http://hbase.apache.org/book.html#mapreduce.example You use the Result instance just like you would from a client program. On 4/17/12 6:07 AM, Ram rumshe...@gmail.com wrote: have a table called User with two columns ,one called

Re: TIMERANGE performance on uniformly distributed keyspace

2012-04-14 Thread Doug Meil
Hi there- With respect to: * Does it need to hit every memstore and HFile to determine if there isdata available? And if so does it need to do a full scan of that file to determine the records qualifying to the timerange, since keys are stored lexicographically? And... Using scan 'table',

Re: TIMERANGE performance on uniformly distributed keyspace

2012-04-14 Thread Doug Meil
. This work is done in StoreScanner#selectScannersFrom Cheers, N. On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there- With respect to: * Does it need to hit every memstore and HFile to determine if there isdata available? And if so does it need to do a full

Re: Is HBase Thread-Safety?

2012-04-13 Thread Doug Meil
Hi there- Especially with respect to the caching, HBase has a block cache so I think it would be a good idea to review the architecture chapter http://hbase.apache.org/book.html#architecture On 4/13/12 3:45 AM, Bing Li lbl...@gmail.com wrote: NNever, Thanks so much for your answers!

Re: Doumentation broken

2012-04-13 Thread Doug Meil
Looks good. Thanks stack! On 4/13/12 2:47 PM, Stack st...@duboce.net wrote: On Fri, Apr 13, 2012 at 5:05 AM, Doug Meil doug.m...@explorysmedical.com wrote: Stack was still working on the site as of the end of day yesterday... Sorry about that. Our site needed a bit of updating

Re: Is HBase Thread-Safety?

2012-04-12 Thread Doug Meil
re: Is HBase thread-safety? HTable instances are not thread safe, though. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html On 4/12/12 6:10 PM, Bing Li lbl...@gmail.com wrote: Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when

Re: hbase map/reduce questions

2012-04-05 Thread Doug Meil
. with the default behavior only two nodes will work for a map/reduce task., isn't it ? if i do a custom input that split the table by 100 rows, can i distribute manually each part on a node regardless where the data is ? Le 5 avril 2012 00:36, Doug Meil doug.m...@explorysmedical.com a écrit : The default

Re: How to debug and run hadoop/HBase source code in eclipse

2012-04-04 Thread Doug Meil
Hi there- See... http://hbase.apache.org/book.html#developer On 4/4/12 8:02 AM, Asmi smita.j...@gmail.com wrote: Hi, May I know is there any book just like you suggested for HBase to make the changes. Asmi.

Re: hbase map/reduce questions

2012-04-04 Thread Doug Meil
Hi there, you probably want to see this.. http://hbase.apache.org/book.html#splitter ... as well as this... http://hbase.apache.org/book.html#regions.arch.locality ... as the latter describes data locality. On 4/4/12 7:41 AM, sdnetwork sdnetw...@gmail.com wrote: Hello, I started

Re: hbase map/reduce questions

2012-04-04 Thread Doug Meil
The default behavior is that the input splits are where the data is stored. On 4/4/12 5:24 PM, sdnetwork sdnetw...@gmail.com wrote: ok thanks, but i don't find the information that tell me how the result of the split is distrubuted across the different node of the cluster ? 1) randomely ?

Re: HBase database sample

2012-04-02 Thread Doug Meil
See the link to the BigTable paper here... http://hbase.apache.org/book.html#other.info ... and there is other reading material and videos too. On 4/1/12 11:30 PM, Mahdi Negahi negahi.ma...@hotmail.com wrote: thanks, but all databases have good examples , like Cinema in Neo4j and etc. but

Re: HBase database sample

2012-04-02 Thread Doug Meil
Also, see this chapter. http://hbase.apache.org/book.html#schema On 4/2/12 11:40 AM, Doug Meil doug.m...@explorysmedical.com wrote: See the link to the BigTable paper here... http://hbase.apache.org/book.html#other.info ... and there is other reading material and videos too. On 4/1

Re: HBaseCon Where is the information?

2012-04-02 Thread Doug Meil
HBaseCon is also on the home page... http://hbase.apache.org/ On 4/2/12 3:18 PM, Lars George lars.geo...@gmail.com wrote: http://www.hbasecon.com/ On Apr 2, 2012, at 10:16 PM, Marcos Ortiz wrote: I heard yesterday that the first conference dedicated to HBase will be in the next days.

Re: 0.92 and Read/writes not scaling

2012-03-30 Thread Doug Meil
Just as a quick reminder regarding what Todd mentioned, that's exactly what was happening in this case study... http://hbase.apache.org/book.html#casestudies.slownode ... although it doesn't appear to be the problem in this particular situation. On 3/29/12 8:22 PM, Juhani Connolly

Re: HBase bulk loader doing speculative execution when it set to false in mapred-site.xml

2012-03-30 Thread Doug Meil
Speculative execution is on by default in Hadoop. One of the Performance recommendations in the Hbase RefGuide is to turn it off. On 3/30/12 6:12 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Well that's not an HBase configuration, that's Hadoop. I'm not sure if this is listed

HBase RefGuide updated

2012-03-28 Thread Doug Meil
Hi folks- The HBase RefGuide has been updated on the website. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: HBase RefGuide updated

2012-03-28 Thread Doug Meil
each of those links is a separate study in this new chapter. From: Doug Meil doug.m...@explorysmedical.commailto:doug.m...@explorysmedical.com Date: Wed, 28 Mar 2012 17:50:01 -0400 To: user@hbase.apache.orgmailto:user@hbase.apache.org user@hbase.apache.orgmailto:user@hbase.apache.org Subject

Re: Confirming a Bug

2012-03-23 Thread Doug Meil
Speculative execution is on by default. http://hbase.apache.org/book.html#mapreduce.specex On 3/23/12 8:04 AM, Peter Wolf opus...@gmail.com wrote: Hi Michel, I agree it doesn't make sense, but then I believe we are tracking a bug. I don't know about speculative execution, but I certainly

Re: Scan.addFamiliy reduces results

2012-03-15 Thread Doug Meil
re: However, I am getting different number of results, depending on which families are added Yes. I'd suggest you read this in the RefGuide. http://hbase.apache.org/book.html#datamodel On 3/15/12 12:08 PM, Peter Wolf opus...@gmail.com wrote: Hi all, I am doing a scan on a table with

Re: partial scanning

2012-03-14 Thread Doug Meil
Scans are also described in the RefGuide here... http://hbase.apache.org/book.html#data_model_operations On 3/14/12 2:22 AM, Akbar Gadhiya akbar.gadh...@gmail.com wrote: Hi, You can perform scan this way, scan 'tablename', {STARTROW='name + start time stamp', ENDROW='name + end time

Re: HBase rowkey RDBMS PK

2012-03-14 Thread Doug Meil
Hi there- You probably want to see this in the RefGuide... http://hbase.apache.org/book.html#schema On 3/14/12 9:47 PM, 韶隆吴 yechen1...@gmail.com wrote: Hi all: I'm trying to import data from oracle to hbase and now I have a problem. In some tables,there have more than one primary

Re: Retrieve Column Family and Column with Java API

2012-03-12 Thread Doug Meil
Hi there- You probably want to see this... http://hbase.apache.org/book.html#dm.column.metadata You can get the CF's from HTableDescriptor. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.ht ml On 3/12/12 10:10 AM, Mahdi Negahi negahi.ma...@hotmail.com wrote:

Re: hbase performance issue

2012-03-11 Thread Doug Meil
If you're using Cloudera, you want to be on CDH3u3 because it has several HDFS performance fixes for low-latency reads. That still doesn't address your 23:00-hour perf issue, but that's something that will help. On 3/11/12 3:39 PM, Антон Лыска ant...@wildec.com wrote: Hi guys! I have a

Re: example of mapreduce output to hbase

2012-03-11 Thread Doug Meil
Hi there- Have you seen the examples in here? http://hbase.apache.org/book.html#mapreduce On 3/11/12 4:59 PM, Weishung Chung weish...@gmail.com wrote: Hey users, I am trying to store mapreduce output directly to HBase. Basically I have a regular mapper reading from files and would like

Re: question in retrieving data from hbase

2012-03-10 Thread Doug Meil
Hi there- There is a chapter in the Hbase RefGuide on the Hbase data model that might be helpful. http://hbase.apache.org/book.html#datamodel On 3/10/12 1:30 AM, newbie24 shripri...@hotmail.com wrote: Thanks Harsh..little confused ..want to clarify some more the row key i have is a

Re: Memory Requirements

2012-03-10 Thread Doug Meil
Hi there- Here are the recommendations from the HBase RefGuide: http://hbase.apache.org/book.html#perf.os ... and they are consistently with what the book says (recommends 64-bit OS and more memory). Also, keep this in mind... http://hbase.apache.org/book.html#arch.overview ... the

Re: HTable.getEndKeys() returning empty results

2012-03-08 Thread Doug Meil
I believe that's covered here... http://hbase.apache.org/book.html#arch.catalog Notes on HRegionInfo: the empty key is used to denote table start and table end. A region with an empty start key is the first region in a table. If region has both an empty start and an empty end key, its

Re: a strange situation of HBase when I issue scan '.META.' command

2012-03-05 Thread Doug Meil
Hi there- You might want to see this in the Ref Guide. http://hbase.apache.org/book.html#arch.catalog A region with an empty start key is the first region in a table. If region has both an empty start and an empty end key, its the only region in the table On 3/5/12 7:27 AM, yonghu

Re: HBase BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Doug Meil
re: Almost every blog I read about HBase tells me it's a clone of BigTable. The HBase website says that too http://hbase.apache.org/ re: Almost every blog I've read about HBase also tells me to use a lot of RAM So does the Hbase Reference Guide...

Re: HBase Region move() and Data Locality

2012-03-05 Thread Doug Meil
This doesn't address your question on move(), but regarding locality, see 8.7.3 in here... http://hbase.apache.org/book.html#regions.arch .. it's not just major compactions, but any write of a storefile that affects locality (flush, minor, major). On 3/5/12 11:02 AM, Bryan Beaudreault

Re: What about storing binary data(e.g. images) in HBase?

2012-03-04 Thread Doug Meil
Agree with what Michael says. Here is the section from the RefGuide on this topic... http://hbase.apache.org/book.html#supported.datatypes ... it's yes you can, as long as you aren't storing 'huge' things On 3/4/12 12:24 PM, Michael Segel michael_se...@hotmail.com wrote: It depends on your

Re: gc pause killing regionserver

2012-03-03 Thread Doug Meil
Hi there- You probably want to read this in the Ref Guide... http://hbase.apache.org/book.html#jvm On 3/3/12 8:05 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Hi, I'm running regionservers with 2GB heap and following tuning options: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

RefGuide updated

2012-03-02 Thread Doug Meil
Hi there folks- The HBase Reference Guide has been updated. One new entry I'd like to point out is a troubleshooting case study… http://hbase.apache.org/book.html#trouble.casestudy … to serve as a blueprint on how to do cluster diagnosis. Doug Meil Chief Software Architect, Explorys doug.m

Re: IN_MEMORY=true settings question

2012-03-02 Thread Doug Meil
Also this... http://hbase.apache.org/book.html#perf.schema (which also has a link to the block cache link that JD cited below) On 3/2/12 2:12 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: It's just a priority in the block cache, read more at:

Re: Hbase Table has many column families

2012-03-02 Thread Doug Meil
6 tables with 1 CF apiece is a perfectly reasonable approach, particularly if the data is to be processed separately. If the data is all to be processed at the same time (e.g., regularly all in one MR job) then you might want to consider a single table with a 1 CF and heterogenous rows. On

Re: Scanning the last N rows

2012-03-02 Thread Doug Meil
Hi there- Take a look at this section of the book... http://hbase.apache.org/book.html#reverse.timestamp On 3/2/12 4:02 PM, Peter Wolf opus...@gmail.com wrote: Hello all, I want to retrieve the most recent N rows from a table, with some column qualifiers. I can't find a Filter, or

Re: Scanning the last N rows

2012-03-02 Thread Doug Meil
Reference Guide, I mean. Not Book. Reference Guide. :-) On 3/2/12 4:31 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there- Take a look at this section of the book... http://hbase.apache.org/book.html#reverse.timestamp On 3/2/12 4:02 PM, Peter Wolf opus...@gmail.com wrote

Re: Scanning the last N rows

2012-03-02 Thread Doug Meil
with result... } Do I have to worry about efficiency? Is the Server madly retrieving rows, in the background, that the Client will never use? Thanks P On 3/2/12 4:31 PM, Doug Meil wrote: Hi there- Take a look at this section of the book... http://hbase.apache.org/book.html#reverse.timestamp

Re: is it possible to run mapreduce on an hbase standalone instance?

2012-03-02 Thread Doug Meil
Yes, you can do MR against standalone Hbase (it uses the LocalJobRunner just like stand-alone Hadoop). I'd focus on this error... 12/03/02 21:42:13 ERROR zookeeper.ZKConfig: no clientPort found in zoo.cfg On 3/2/12 6:14 PM, T Vinod Gupta tvi...@readypulse.com wrote: hi, im wondering if

Re: Set num of mappers in a TableMapReduceUtil.initTableMapperJob

2012-02-29 Thread Doug Meil
You probably want to see this... http://hbase.apache.org/book.html#splitter On 2/29/12 7:48 PM, Vrushali C vrush...@ymail.com wrote: I am using TableMapReduceUtil.initTableMapperJob to initiate a map reduce job that scans the entire table and processes records in it. I wanted to know

Re: HBase Newbie

2012-02-26 Thread Doug Meil
For a simple use case I'd read the BigTable paper. http://research.google.com/archive/bigtable.html On 2/26/12 1:24 AM, Srinivas Reddy hbaselearn...@gmail.com wrote: Hi All, I have good experience on Hadoop administration, Pig , Hive and Sqoop. Interested in learning HBase

Re: Up and running HDFS / HBase Installation

2012-02-24 Thread Doug Meil
re: Coming from a RDBMS background, I am thinking there has to be a set of files under hbase. You're correct, there are. For more info on that structure, see... http://hbase.apache.org/book.html#regions.arch On 2/24/12 5:45 AM, Admin Absoftinc absoft...@gmail.com wrote: I have a up and

Re: How is Data Indexed in HBase?

2012-02-22 Thread Doug Meil
You probably want to start with reading about the StoreFiles and how Hbase stores data internally. http://hbase.apache.org/book.html#regions.arch On 2/22/12 4:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data

Re: hbase delete operation is very slow

2012-02-21 Thread Doug Meil
Hi there- You probably want to see this... http://hbase.apache.org/book.html#perf.deleting .. that particular method doesn't use the write-buffer and is submitting deletes one-by-one to the RS's. On 2/21/12 3:52 PM, Haijia Zhou leons...@gmail.com wrote: Hi, All I'm new to this email list

Re: hbase delete operation is very slow

2012-02-21 Thread Doug Meil
. On 2/21/12 7:39 PM, Stack st...@duboce.net wrote: On Tue, Feb 21, 2012 at 2:45 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there- You probably want to see this... http://hbase.apache.org/book.html#perf.deleting .. that particular method doesn't use the write-buffer

Re: Max region file size

2012-02-19 Thread Doug Meil
Bryan, if you're on 0.90.x the advice is to not go above 4gb regions. The Hfilev2 format is in 0.92 which better supports larger regions. On 2/18/12 5:31 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Hi Bryan The size is the size of the store file, if the size go above this you will

Re: Scans and Bloom Filter

2012-02-16 Thread Doug Meil
Good stuff Nicholas, I'll add this to the book. On 2/16/12 3:52 PM, Nicolas Spiegelberg nspiegelb...@fb.com wrote: Bryan, Currently, ROW ROWCOL Bloom Filters are only checked for explicit, single-row 'Get' scans. ROWCOL BFs are only checked when you're querying for explicit column

Re: how get() works

2012-02-14 Thread Doug Meil
requested rowKey? Is it by linear search or binary search or any other algorithm? Or for every row in that region, is there any hash value stored and hash lookup takes place to get that rowKey's value? what happens really within that region? On Mon, Feb 13, 2012 at 7:35 PM, Doug Meil doug.m

Re: how get() works

2012-02-14 Thread Doug Meil
I say basically because inside a Region there are Stores, and for each Store there are StoreFiles. For more info see: http://hbase.apache.org/book.html#regions.arch On 2/14/12 11:06 AM, Doug Meil doug.m...@explorysmedical.com wrote: Keys are stored in sorted order, it's basically

Re: length and size of a column family name or qualifier vs. amount of disk storage

2012-02-14 Thread Doug Meil
Also see here... http://hbase.apache.org/book.html#keyvalue Compression will make it better on disk, but it will inflate over the wire. On 2/14/12 12:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: We are assuming the longer cf/qual would be written to HDFS billions of times and

Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-14 Thread Doug Meil
To stress what JD just said, the HBase book/Ref Guide (i.e., the online book that is a part of HBase) is open source and the best source of the material (especially the Troubleshooting chapter) is user experience. Minor clarification: HBase the Definitive Guide is a great book by O'Reilly, but

Re: how get() works

2012-02-13 Thread Doug Meil
re: Now if that RegionServer has multiple regions on it, how does the request get transfered to a correct region which has the requested rowKey? See... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# getRegionLocation%28byte[],%20boolean%29 As described in

Re: Which server store the root and .meta. information?

2012-02-12 Thread Doug Meil
around in books... but didn't find, maybe something to do with CatalogJanitor?) Thx, Eric On 11/02/12 20:12, Doug Meil wrote: Regarding the master being down, just be aware that if you lose an RS that you'll have issues because the master is what does the reassignment. Per the previous comments

Re: Which server store the root and .meta. information?

2012-02-11 Thread Doug Meil
Regarding the master being down, just be aware that if you lose an RS that you'll have issues because the master is what does the reassignment. Per the previous comments, at steady-state HBase can run without the master - there's an asterisk. On 2/11/12 11:31 AM, Eric Charles e...@apache.org

Re: Which server store the root and .meta. information?

2012-02-10 Thread Doug Meil
Also, there is a description of what is in META and ROOT in here... http://hbase.apache.org/book.html#arch.catalog ... and it also describes the startup sequencing. On 2/10/12 10:46 AM, Harsh J ha...@cloudera.com wrote: The client does communicate with the master to perform .META.

Book/RefGuide updated

2012-02-10 Thread Doug Meil
for coprocessors was created in Arch/RegionServer has been added, pointing to the great blog entry on the subject. Eventually this can be merged to the book, but at least there's a link to it now. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: Is it possible to scan the whole table without specify column classifier?

2012-02-09 Thread Doug Meil
Yep! You got it. Don't forget to always close the ResultScanner http://hbase.apache.org/book.html#data_model_operations On 2/9/12 6:35 AM, Listas Discussões lis...@arianpasquali.com wrote: I've just realized how to do it there is no need to specify column at the scan. here is the

Re: Writing to HBase from the Hadoop reduce

2012-02-08 Thread Doug Meil
Hi there- In addition to what Stack said, you probably want to review these: http://hbase.apache.org/book.html#mapreduce http://hbase.apache.org/book.html#performance On 2/8/12 10:51 AM, Stack st...@duboce.net wrote: On Wed, Feb 8, 2012 at 3:15 AM, Vladi Feigin vladi.fei...@nice.com wrote:

Re: storing logs in hbase

2012-02-05 Thread Doug Meil
Hi there- You probably want to check out these chapters of the Hbase ref guide: http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema http://hbase.apache.org/book.html#mapreduce ... and with respect to the 40 minutes per report, a common pattern is to create

Re: storing logs in hbase

2012-02-05 Thread Doug Meil
... but it depends on what you want to do. If you want full-text searching, then yes, you probably want to look at Lucene. If you want activity analysis, summaries are probably better. On 2/5/12 1:54 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there- You probably want to check

Re: HBase coprocessors blog posted

2012-02-01 Thread Doug Meil
I'll make sure the book links to this. On 2/1/12 3:26 AM, Mingjie Lai m...@apache.org wrote: Hi hbasers. A hbase blog regarding coprocessors has been posted to apache blog site. Here is the link: https://blogs.apache.org/hbase/entry/coprocessor_introduction Your comments are welcome.

Re: PerformanceEvaluation results

2012-02-01 Thread Doug Meil
Hi there- These perf-tests on small clusters are fairly common questions on the dist-list, but it needs to be stressed that Hbase (and HDFS) doesn't begin to stretch it's legs until about 5 nodes. http://hbase.apache.org/book.html#arch.overview On 2/1/12 7:51 AM, Tim Robertson

Re: type mismatch in mapreduce program

2012-01-28 Thread Doug Meil
In addition, see... http://hbase.apache.org/book.html#mapreduce.example On 1/28/12 6:43 AM, Ioan Eugen Stan stan.ieu...@gmail.com wrote: 2012/1/28 Vamshi Krishna vamshi2...@gmail.com: Hi, here i am trying to read rows from a table, and put them to a file as it is.For that my mapper class

Re: Speeding up Scans

2012-01-25 Thread Doug Meil
Hi there- Quick sanity check: what caching level are you using? (default is 1) I know this is basic, but it's always good to double-check. If language is already in the lead position of the rowkey, why use the filter? As for EC2, that's a wildcard. On 1/25/12 7:56 AM, Peter Wolf

Re: hbase and apache.org down

2012-01-25 Thread Doug Meil
It's back up. Never mind. On 1/25/12 10:33 AM, Doug Meil doug.m...@explorysmedical.com wrote: Not only is the hbase website down, but apache.org appears to be down. http://www.downforeveryoneorjustme.com/hbase.apache.org http://www.downforeveryoneorjustme.com/www.apache.org Doug Meil

Re: Speeding up Scans

2012-01-25 Thread Doug Meil
;-) Adding the following speeded things up quite a bit scan.setCacheBlocks(true); scan.setCaching(1000); Thank you, it was a duh! P On 1/25/12 8:13 AM, Doug Meil wrote: Hi there- Quick sanity check: what caching level are you using? (default is 1) I know this is basic, but it's

Re: Important Question

2012-01-25 Thread Doug Meil
Because you specifically cited the medical domain in your question, I think you might want talk to Explorys (disclaimer: I work there). Otherwise, you probably want to look at the HBase book. On 1/25/12 11:30 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: So what about HBQL?? And if i

Re: Speeding up Scans

2012-01-25 Thread Doug Meil
scan.setCacheBlocks(true); scan.setCaching(1000); Thank you, it was a duh! P On 1/25/12 8:13 AM, Doug Meil wrote: Hi there- Quick sanity check: what caching level are you using? (default is 1) I know this is basic, but it's always good to double-check. If language is already

Re: Important Question

2012-01-25 Thread Doug Meil
Hi there- As someone who works with medical data I take such analysis very seriously, but according to the World Health Organization there were 608 cases of measles reported in Egypt in 2011 (page 82). Granted, these are probably incidence and not prevalence statistics, but the order of

<    1   2   3   4   >