Re: HBase connection pool

2011-07-18 Thread Doug Meil
Hi there, you probably want to start with this first. http://hbase.apache.org/book.html#client On 7/18/11 2:16 AM, aadish kotwal.aad...@gmail.com wrote: Hey people, I am very new to HBase, and I would like if someone gave me guidance regarding connection pooling. Thanks a lot, in

Re: Basic information about HBase?

2011-07-18 Thread Doug Meil
Hi there- Read this for starters... http://hbase.apache.org/book.html On 7/18/11 10:24 AM, shanmuganathan.r shanmuganatha...@zohocorp.com wrote: Hi, I am the new user for the HBase .I installed Hadoop 0.20.2, Zookeeper 3.3.3, HBase 0.90.1 in three node cluster. They are

Re: loading data in HBase table using APIs

2011-07-18 Thread Doug Meil
-creation of regions]. In any case, don't reduce if you can avoid it. Dave -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: Sunday, July 17, 2011 4:40 PM To: user@hbase.apache.org Subject: Re: loading data in HBase table using APIs Hi there- Take a look

Re: HBase reading performance

2011-07-18 Thread Doug Meil
Hi there- Just taking a stab at something... http://hbase.apache.org/book.html#disable.splitting ... what is your compaction interval? Is it the default? On 7/18/11 8:21 PM, hmch...@tsmc.com hmch...@tsmc.com wrote: Hi there, HBase read performance works fine in most of the time, but

Re: HBase reading performance

2011-07-18 Thread Doug Meil
. Maybe we can modify compaction interval, any ideas? Thank you. Fleming Chiu(邱宏明) Ext: 707-2260 Be Veg, Go Green, Save the Planet! Doug Meil doug.meil@explorysmedica l.com

Re: HBase MapReduce Zookeeper

2011-07-19 Thread Doug Meil
Hi there- re: that we have to reuse the Configuration object You are probably referring to this... http://hbase.apache.org/book.html#client.connections ... yes, that is general guidance on client connection.. re: do i have to create a pool of Configuration objects, to share them

Re: thread safety of incrementColumnValue

2011-07-20 Thread Doug Meil
Hi there- I think there are two subjects here: 1) the fact that HTable isn't thread-safe 2) how counters work Even if you are incrementing counters, you shouldn't be sharing HTable instances across threads. Counters get updated atomically on the RS, not on the client. Counter behavior

Re: thread safety of incrementColumnValue

2011-07-21 Thread Doug Meil
but I am not sure how HTablePool.get() would work under multithreaded environment. thanks On Wed, Jul 20, 2011 at 6:28 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there- I think there are two subjects here: 1) the fact that HTable isn't thread-safe 2) how counters work Even

Re: Why use Reverse Timestamp as the Row Key?

2011-07-22 Thread Doug Meil
It's so that you can get the most recent entry with a Scan. Assuming that the key-structure (as explained in the book) is something like thingrev-timestamp. And you are trying to quickly find the most recent entry for thing. On 7/22/11 3:18 AM, edward choi mp2...@gmail.com wrote: Hi, I

Re: How to implement efficient bulk query

2011-07-22 Thread Doug Meil
, Nanheng Wu nanhen...@gmail.com wrote: That makes sense. So is there a limit on how large the batch size can be? Or, say if I pass all of my queries in one batch of size 10K, would that cause problems? On Fri, Jul 22, 2011 at 7:47 AM, Doug Meil doug.m...@explorysmedical.com wrote: That method

Re: Design/Schema questions

2011-07-27 Thread Doug Meil
Following up on what Stack said, make sure you read this.. http://hbase.apache.org/book.html#performance ... this chapter also refers to OpenTSDB for certain topics (especially key-design issues). On 7/26/11 4:27 PM, Stack st...@duboce.net wrote: On Tue, Jul 26, 2011 at 1:08 PM, Mark

Re: So Bad Random Read Performance

2011-07-30 Thread Doug Meil
For background information on HDFS-Hbase performance, reviewing the umbrella ticket is a good place to start... https://issues.apache.org/jira/browse/HDFS-1599 But if I am reading this correctly... I write 100M data in Hbase,only 1M key/value,so all keys are in only *one* Store,and only 2

Re: test for flags in HBase

2011-08-01 Thread Doug Meil
You could use an integer for the same purpose. On 8/1/11 5:26 PM, Ted Yu yuzhih...@gmail.com wrote: I am not aware of plan to implement this. Cheers On Mon, Aug 1, 2011 at 2:20 PM, Ioan Eugen Stan stan.ieu...@gmail.comwrote: Hello, Is there any way to set/test for a bit flag in HBase? I

Re: loading data in HBase table using APIs

2011-08-04 Thread Doug Meil
it sounds like he doesn't need to have any kind of reduce phase [and may be a great candidate for bulk loading and the pre-creation of regions]. In any case, don't reduce if you can avoid it. Dave -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: Sunday, July 17

Re: loading data in HBase table using APIs

2011-08-05 Thread Doug Meil
. From the OP's description it sounds like he doesn't need to have any kind of reduce phase [and may be a great candidate for bulk loading and the pre-creation of regions]. In any case, don't reduce if you can avoid it. Dave -Original Message- From: Doug Meil [mailto:doug.m

HBase wiki updates

2011-08-06 Thread Doug Meil
, Design questions and Operations/Troubleshooting questions. http://wiki.apache.org/hadoop/Hbase/FAQ Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: HBase book grammatical error

2011-08-08 Thread Doug Meil
Thanks Jeff. I'll get it this week. On 8/8/11 5:17 PM, Jeff Whiting je...@qualtrics.com wrote: http://hbase.apache.org/book.html#number.of.cfs In 6.2. On the number of column families it says HBase currently does not do well with anything about two or three column families so I

Re: Finding the trace of a query

2011-08-11 Thread Doug Meil
Hi there- Please see the Hbase book. http://hbase.apache.org/book.html This has a chapter on developing with Hbase. As for SQL, that's not supported directly because Hbase is a NoSQL database (see the Data Model chapter). Doug On 8/11/11 6:44 AM, Anurag Awasthi anuragawasth...@gmail.com

Re: Using HTable.batch - Still room for performance improvements?

2011-08-11 Thread Doug Meil
One thing you might want to look at is HTableUtil. It's on trunk, but you can look at the source and port it to whatever version you are using. We've found that region-sorting helps a lot by minimizing the number of RS calls in any given flush. On 8/11/11 5:57 PM, Jean-Daniel Cryans

Re: Generic Schema Question

2011-08-13 Thread Doug Meil
See this section in the Hbase book... 11.6.3. Close ResultScanners There is a snippet of how to use a Scan, which is what you'd what for that. I just realized that there should be a better Scan example in the Data Model chapter. I'll add it. Doug Meil Chief Software Architect, Explorys

Re: Column family limitations

2011-08-13 Thread Doug Meil
See the Hbase book... http://hbase.apache.org/book.html#number.of.cfs On 8/13/11 8:17 PM, Mark static.void@gmail.com wrote: I don't quite remember where but I think I remember hearing that you should only have a handful of column families (1-4) per table. Is this true and if so, why?

Re: HTable thread safe for read?

2011-08-14 Thread Doug Meil
I wouldn't do it... Some of the other committers can comment more on this, but there is state cached in HTable instances when scanning. E.g.,... protected class ClientScanner implements ResultScanner { private final Log CLIENT_LOG = LogFactory.getLog(this.getClass()); // HEADSUP: The

Re: Sequential column reading in the big row. Is it possible?

2011-08-15 Thread Doug Meil
Hi there- You probably want to review this part of the hbase book... http://hbase.apache.org/book.html#perf.reading On 8/15/11 4:15 AM, Andrey Gomzin gomzind...@gmail.com wrote: Hi! I use HBase in a single-node mode. My rows in the table are huge. I have to read sequentially all columns

Re: Column family limitations

2011-08-15 Thread Doug Meil
something I wish I would have understood back when I last designed an hbase schema. Dave -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: Saturday, August 13, 2011 6:21 PM To: user@hbase.apache.org Subject: Re: Column family limitations See the Hbase book

Re: HTable thread safe for read?

2011-08-16 Thread Doug Meil
and/or read data can be corrupted. Or maybe is HTable still thread-safe for simple Gets only? Thx. On 14/08/11 15:29, Doug Meil wrote: I wouldn't do it... Some of the other committers can comment more on this, but there is state cached in HTable instances when scanning. E.g

Re: operational overhead for HBase

2011-08-16 Thread Doug Meil
One of the things on my to-do list is to reorganize the Tools appendix into a full-fledged Operations chapter. Even though that's a few weeks out, there are still a lot of points in there worth noting. Especially in the subject of backup (Master and RegionServers, etc.). That's arguably the

Re: Accessing a sparate HBase cluster

2011-08-17 Thread Doug Meil
To be on the safe side, you probably want to double-check this. http://hbase.apache.org/book.html#client_dependencies On 8/17/11 3:00 AM, Hari Sreekumar hsreeku...@clickable.com wrote: Hi, I want to separate my application machines from the HBase cluster. So far, we have always run the

Re: Versioning

2011-08-17 Thread Doug Meil
Versioning can be used to see the previous state of a record. Some people need this feature, others don't. One thing that may be worth a review is this... http://hbase.apache.org/book.html#keysize ... and specifically the fact about all the values being freighted with timestamp (aka version)

Re: loading data in HBase table using APIs

2011-08-18 Thread Doug Meil
: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package -summary.html#sink St.Ack On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil doug.m...@explorysmedical.com wrote: It's not obvious to a lot of newer folks that an MR job can exist minus the R. On 8/4/11 5:52 PM

Re: HBase tables snippet

2011-08-21 Thread Doug Meil
If you just need a sample of rows from tables, use the client API (e.g., HTable) and scan off a few hundred rows or whatever you need. The harder part of local unit testing is getting a coherent set of rows that makes sense between all the tables - but that's something that every developer has

Re: mvn eclipse:eclipse fails to create project from hbase-common-trunk

2011-08-23 Thread Doug Meil
Are you using maven 3 or maven 2? Should use 2. That's a doc-item that is currently missing which I have changed locally (but not patched yet). On 8/23/11 9:28 AM, Anurag Awasthi anuragawasth...@gmail.com wrote: Hello, I am trying to modify the source code and for that I am following the

Re: Versioning

2011-08-26 Thread Doug Meil
events, or messages of Facebook. In these cases, what is the trade off between saving them in different rows, and in different versions of one row? Thank you. Sean 2011/8/18 Doug Meil doug.m...@explorysmedical.com Versioning can be used to see the previous state of a record. Some

Re: CopyTable

2011-08-29 Thread Doug Meil
Hi there- http://hbase.apache.org/book.html#copytable On 8/29/11 9:50 AM, Steinmaurer Thomas thomas.steinmau...@scch.at wrote: Hello, for test purposes, I would like to have an identical copy of an existing table under a different name on the same cluster. Basically I found the example

Re: book update in progress

2011-08-29 Thread Doug Meil
Hi folks- Seems like everything got up there ok. Let me know if anybody has any issues. On 8/29/11 5:15 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi folks- The HBase book has been udpatedŠ http://hbase.apache.org/book.html There are a LOT of changes in hereŠ * Schema Design

Re: Connect In HBase

2011-09-02 Thread Doug Meil
Hi there- Have you read this? http://hbase.apache.org/book.html#client_dependencies On 9/2/11 9:53 AM, Diego Gomes Araújo diegogomesara...@gmail.com wrote: Hello! I'm having the following doubts, how do I connect in my HBase straight from my java web application. For example: I have my

Re: How to debug and run hadoop/HBase source code in eclipse

2011-09-02 Thread Doug Meil
Also, see this... http://hbase.apache.org/book.html#developer On 9/2/11 12:38 AM, Li Pi l...@cloudera.com wrote: So you can't run HBase within eclipse, though you can run all the test cases. git pull HBase, then do mvn eclipse:eclipse, this should setup an eclipse project in the directory.

Re: How to debug and run hadoop/HBase source code in eclipse

2011-09-02 Thread Doug Meil
, troubleshooting) or design questions. The dev dist-list is for people who are changing source. If you have questions about coding Hbase, aim it at the dev-list. Doug On 9/2/11 10:22 AM, Doug Meil doug.m...@explorysmedical.com wrote: Also, see this... http://hbase.apache.org/book.html#developer

Re: Incremental pre-aggregation strategy with MapReduce

2011-09-02 Thread Doug Meil
What Stack says. Plus, for other tips see... http://hbase.apache.org/book.html#mapreduce http://hbase.apache.org/book.html#schema On 9/2/11 11:15 AM, Stack st...@duboce.net wrote: Can you rely on versioning? If MR job runs once a day, only aggregate whats changed in last day? Turn off

Re: prevent region splits?

2011-09-04 Thread Doug Meil
Along with what Jack said, see this... http://hbase.apache.org/book.html#required_configuration .. and just double check that you don't have scheduled major compactions going off once a day (the default) On 9/3/11 7:54 PM, Jack Levin magn...@gmail.com wrote: Make hbase.hregion.max.filesize

Re: HBase Meetup during Hadoop World NYC '11

2011-09-06 Thread Doug Meil
Explorys is sending a few, so +3 or so. On 9/6/11 4:49 AM, Steven Noels stev...@outerthought.org wrote: On Wed, Aug 31, 2011 at 5:21 AM, Todd Lipcon t...@cloudera.com wrote: I haven't gotten many responses so far. If there doesn't seem to be much interest, I may not spend the time to

Site and Book updated

2011-09-07 Thread Doug Meil
Hi folks- Stack deployed the book update last night and this contains some new material (more MapReduce examples). http://hbase.apache.org/book.html Plus, the book and website contain the new logo! Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: Copying tables from one server to another

2011-09-07 Thread Doug Meil
Hi there- Have you tried this? http://hbase.apache.org/book.html#copytable It's the Java invocation of the copy-table function (without the ruby) On 9/7/11 8:29 PM, Tom Goren t...@tomgoren.com wrote: So I have read http://blog.sematext.com/2011/03/11/hbase-backup-options/ The built-in

Re: bulk insert

2011-09-10 Thread Doug Meil
What do you mean? On 9/10/11 8:05 AM, sriram rsriram...@gmail.com wrote: Is there any problem for using tablemapreduceutil

Re: Speculative execution and TableOutputFormat

2011-09-10 Thread Doug Meil
Hi Bryan, yep, that same advice is in the hbase book. http://hbase.apache.org/book.html#mapreduce.specex That's a good suggestion, and perhaps moving that config to TableMapReduceUtil would be beneficial. On 9/10/11 4:22 PM, Bryan Keller brya...@gmail.com wrote: I believe there is a

Re: HBase best practice and Regions confusion

2011-09-13 Thread Doug Meil
Hi there- Regarding EC2, see this in the Hbase book... http://hbase.apache.org/book.html#trouble.ec2 Regarding ROOT/META, see this in the Hbase book http://hbase.apache.org/book.html#arch.catalog On 9/13/11 6:16 AM, Ronen Itkin ro...@taykey.com wrote: Hi all, How are you? I am new to

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Doug Meil
Chris, agreed... There are sometimes that reducers aren't required, and then situations where they are useful. We have both kinds of jobs. For others following the thread, I updated the book recently with more MR examples (read-only, read-write, read-summary)

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Doug Meil
I was in the middle of responding to Mike's email when yours arrived, so I'll respond to both. I think the temp-table idea is interesting. The caution is that a default temp-table creation will be hosted on a single RS and thus be a bottleneck for aggregation. So I would imagine that you would

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Doug Meil
I'll add this to the book in the MR section. On 9/16/11 8:22 PM, Doug Meil doug.m...@explorysmedical.com wrote: I was in the middle of responding to Mike's email when yours arrived, so I'll respond to both. I think the temp-table idea is interesting. The caution is that a default temp

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Doug Meil
for a limited set of rows that just care about the aggregations, but receive a lot of traffic. I wonder if this will also be the case, if I was to use the source table to maintain these temporary records, and not create a temp table on the fly ... On Fri, Sep 16, 2011 at 5:24 PM, Doug Meil doug.m

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Doug Meil
, Doug Meil doug.m...@explorysmedical.com wrote: However, if the aggregations in the mapper were kept in a HashMap (key being the aggregate, value being the count), and then the mapper made a single pass over this map during the cleanup method and then did the checkAndPuts, it would mean

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-19 Thread Doug Meil
to write MR-Jobs. Regards, Thomas -Original Message- From: Doug Meil [mailto:doug.m...@explorysmedical.com] Sent: Freitag, 16. September 2011 21:42 To: user@hbase.apache.org Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ... Chris, agreed... There are sometimes

Re: MR-Job custom property management?

2011-09-21 Thread Doug Meil
If you need to pass args to a mapper-task for example, using the Configuration object (as Stack suggested) is a common pattern. But just don't stick anything huge in there because that config is apparently copied repeatedly in the MR framework. On 9/21/11 12:09 PM, Stack st...@duboce.net

Re: table version question

2011-09-25 Thread Doug Meil
You won't be able to see previous versions of a row. If that's not important to you, then changing it to 1 will save you space. Per.. http://hbase.apache.org/book.html#schema.versions ... The older versions are cleaned up after a major compaction. On 9/25/11 6:45 AM, Rita

Re: setTimeRange for HBase Increment

2011-09-29 Thread Doug Meil
: Doug Meil may point you to related doc. Take a look at this as well: https://issues.apache.org/jira/browse/HBASE-4241 On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp jame...@bronto.com wrote: Hm, well I didn't mention a number of other requirements for the feature I'm building, but long story short

Re: Error creating table.

2011-09-30 Thread Doug Meil
Is this happening in a standalone instance? Remote cluster? Can you provide some context? On 9/30/11 12:06 PM, sidusa mail david.ger...@sidusa.com wrote: I am getting this error when I try and create a table using the hbase shell. ERROR:

Re: REST or Thrift or something else?

2011-10-04 Thread Doug Meil
I would hazard a guess that the Java client is the most utilized, followed by REST and Thrift. But if your client application isn't in Java then the Java client obviously won't work. On 10/4/11 9:57 AM, Jim R. Wilson wilson.ji...@gmail.com wrote: Hi HBase users, Which is the more popular

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

2011-10-04 Thread Doug Meil
wrote: Are you surmising that from the description of setting a minimum version? On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil doug.m...@explorysmedical.com wrote: http://hbase.apache.org/book.html#schema.versions I believe if you set that to 0 it should disable the versioning. On 10/4/11 2

Re: range query

2011-10-05 Thread Doug Meil
Hi there- Check out the Hbase book... http://hbase.apache.org/book.html#scan On 10/5/11 3:29 AM, Rita rmorgan...@gmail.com wrote: Hello, I have a simple table where the data looks like this, key,value 2011-01-01.foo,data01 2011-01-02.foo,data02 2011-01-03.foo,data03 2011-01-04.foo,data04

Re: Error running org.apache.hadoop.examples.DBCountPageView

2011-10-07 Thread Doug Meil
Hi there- I don't think this is an HBase question. On 10/7/11 10:35 AM, Ta, Le (Clovis) ta...@ne.bah.com wrote: Hi, I am getting the following exception when trying to run the DBCountPageView example obtained from http://search-hadoop.com/c/Map/Reduce:/src/examples/org/apache/hadoop/exam

Re: How can HBase return the metadata to the client?

2011-10-08 Thread Doug Meil
Hi there- There is a section in the book on this... http://hbase.apache.org/book.html#arch.catalog On 10/8/11 6:19 AM, yonghu yongyong...@gmail.com wrote: Hello, I read the blog http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html and tried to understand the

Re: basic question for newbie

2011-10-10 Thread Doug Meil
You'll want to review this too... http://hbase.apache.org/book.html#schema On 10/10/11 2:51 AM, Sonal Goyal sonalgoy...@gmail.com wrote: One possible schema for your case could be: rowkey: book. Column familty: author, qualifiers one, two, three... AND similar table for authors rowkey:

Book site update

2011-10-11 Thread Doug Meil
(min/max now separate sub-sections) * Schema Design: Schema Design Smackdown! A new section has been added in this chapter for commonly asked schema design question. (Spoiler alert: more rows) * OpsMgt: A first-cut capacity planning section for storage calcs has been added. Doug Meil

Re: Book site update

2011-10-11 Thread Doug Meil
versions, *but keep at least M versions around* (where M is the value for minimum number of row versions, M=N) Please refer to discussion toward the end of HBASE-4536 where Lars suggested dropping support for the case of M==N 0. Cheers On Tue, Oct 11, 2011 at 12:59 PM, Doug Meil doug.m

Book updated

2011-10-17 Thread Doug Meil
at the end. * More Performance/Arch information in light of recent discussions (including EC2 and HDFS) * Other minor edits. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: setting ulimit in os Lion X

2011-10-17 Thread Doug Meil
Jignesh, for an example of the error that JD is citing (and other common errors) see the Troubleshooting chapter. http://hbase.apache.org/book.html#trouble On 10/17/11 2:43 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: It always depends on the scope of your test :) For me the defaults

Re: integrating hadoop and Hbase with eclipse

2011-10-19 Thread Doug Meil
In addition to what Jonathan just said, see http://hbase.apache.org/book.html#ides On 10/19/11 3:05 AM, Jonathan Gray jg...@fb.com wrote: Not sure what kind of integration you're talking about, but if just want to create a project with the HBase source then just grab an SVN checkout of an

Re: Custom timestamps

2011-10-21 Thread Doug Meil
Stack, he might be referring to this... http://hbase.apache.org/book.html#versions ... I updated this recently based on an exchange with JD and somebody else. On 10/21/11 1:08 AM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi St. Ack, I read something while browsing . Right now don't have

Re: data mining

2011-10-25 Thread Doug Meil
And the HBase book. (http://hbase.apache.org/book.html) To second what JD already said, the best way to learn HBase is to use it. On 10/24/11 4:32 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Mon, Oct 24, 2011 at 1:21 PM, Jignesh Patel jigneshmpa...@gmail.com wrote: J-D, Thanks

Re: sum, avg, count, etc...

2011-10-26 Thread Doug Meil
Hi there- First, make sure you aren't tripping on any of these issues.. http://hbase.apache.org/book.html#perf.reading On 10/26/11 6:21 AM, Rita rmorgan...@gmail.com wrote: I am trying to do some simple statistics with my data but its taking longer than expected. Here is how my data is

Re: A requirement to change time of the Hbase cluster.

2011-10-26 Thread Doug Meil
Hi there- As a reminder about running without an NTP server, the config section of the book strongly cautions against this: http://hbase.apache.org/book.html#ntp On 10/25/11 10:36 PM, Gaojinchao gaojinc...@huawei.com wrote: Thanks for your reply. Our application scenario, The equipment and

Re: HBase, Hive, Hive over HBase or Pig over HBase

2011-10-26 Thread Doug Meil
re: 30 million records. We're obviously pro-HBase on this dist-list but one of the challenges of HBase (and Hadoop in general) is that the architecture can tend to be overkill on smaller datasets. That doesn't mean you shouldn't try HBase, but expectations should be tempered. Especially with

Re: Lease does not exist exceptions

2011-10-27 Thread Doug Meil
I'll add something in the docs. On 10/27/11 3:35 AM, Lucian Iordache lucian.george.iorda...@gmail.com wrote: Yep. did not work entirely. I had a job to run on 1000 regions. And the caching was 200. The job crashed with a lot of ClosedChannelExceptions + LeaseExceptions. Set the caching to 10

Re: HBase, Hive, Hive over HBase or Pig over HBase

2011-10-27 Thread Doug Meil
point me to an example use case scenario that has taken this approach? Thanks Vivek On Thu, Oct 27, 2011 at 1:27 AM, Doug Meil doug.m...@explorysmedical.comwrote: re: 30 million records. We're obviously pro-HBase on this dist-list but one of the challenges of HBase (and Hadoop in general

Re: Multiple values for the cell

2011-10-28 Thread Doug Meil
Per what Stack just said... http://hbase.apache.org/book.html#supported.datatypes On 10/28/11 12:38 PM, Stack st...@duboce.net wrote: On Fri, Oct 28, 2011 at 9:28 AM, Jain, Kokil ja...@bit-sys.com wrote: Is it possible for cell values to be lists? We don't support 'lists' natively. The

Re: JBoss 7 with Thrift or Avro

2011-10-28 Thread Doug Meil
See... http://hbase.apache.org/book.html#client ... The hbase client talks directly to the RegionServers. On 10/28/11 2:24 PM, Jignesh Patel jigneshmpa...@gmail.com wrote: If I use Java there will be 1-1 mapping between 1 JVM to 1 region server. While I am not sure, I thought(presume)

Re: sum, avg, count, etc...

2011-10-29 Thread Doug Meil
rows. Otherwise you can wind up getting back just the data you expect, but still scanning all the way to the end of the table, just filtering out all the remaining rows. On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there- First, make sure you aren't

Re: new to hbase

2011-11-04 Thread Doug Meil
In addition to what Joey said, it's worth taking inventory of your current pain points. What are your current challenges? Ad-hoc queries? Creating summaries? If it's entirely ad-hoc and your users want SQL, then I think Joey pretty much nailed it. For more info:

Re: How to set region parameter?

2011-11-09 Thread Doug Meil
Hi there- re: #1 http://hbase.apache.org/book.html#perf.configurations Regarding this, for the 0.90.x codebase the largest recommended region size is 4Gb. The 20Gb number in the book now was for a cluster running which was running Hfile v2 format, which is different than what is in

Re: Row get very slow

2011-11-14 Thread Doug Meil
Hi there- re: The question is : in what application BLOCKSIZE should be changed (increased or decreased) ? See.. http://hbase.apache.org/book.html#schema.creation and... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.h tml On 11/14/11 3:51 AM, Damien Hardy

Re: Web analytics and HBase

2011-11-14 Thread Doug Meil
Hi there- See... http://hbase.apache.org/book.html#rowkey.design http://hbase.apache.org/book.html#mapreduce.example.summary http://hbase.apache.org/book.html#precreate.regions Especially focus on the rowkey part because it mentioned OpenTSDB specifically. On 11/13/11 11:10 PM,

Re: MR - Input from Hbase output to HDFS

2011-11-14 Thread Doug Meil
Glad to worked through that and everything is working. I will add an example of MR to Hbase-to-HDFS in the book. On 11/14/11 1:24 AM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi, I think that issue is with Filesystem Configuration, as in config, it is picking HbaseConfiguration. When I

Re: HBase block replication

2011-11-14 Thread Doug Meil
Hi there- Look at the Hbase files on the HDFS file system... http://hbase.apache.org/book.html#trouble.namenode.hbase.objects ... and run some checks like 'fsck' On 11/14/11 8:39 PM, Mark static.void@gmail.com wrote: How can this be tested and or verified? Would/should it show up

Re: hbase.regionserver.handler.count

2011-11-15 Thread Doug Meil
Hi there, per.. http://hbase.apache.org/book.html#perf.handlers ... this is a per-RegionServer config, and it would take a pretty big box to satisfy 100 concurrent RS requests. On 11/15/11 11:09 AM, Mark static.void@gmail.com wrote: In the HBase book it states: It is safe to set that

Re: hbase.regionserver.handler.count

2011-11-15 Thread Doug Meil
Yep, I'll update it. Thanks for point that out. On 11/15/11 12:43 PM, Mark static.void@gmail.com wrote: Perhaps section 2.8.2.3 of the hbase book should be updated then? On 11/15/11 9:17 AM, Doug Meil wrote: Hi there, per.. http://hbase.apache.org/book.html#perf.handlers

Re: Not able to change the VERSION of hbase row

2011-11-16 Thread Doug Meil
Also, there is a versioned Get example here... http://hbase.apache.org/book.html#get On 11/16/11 12:34 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: You need to tell the Get or Scan to fetch more versions. For example, the help for the get commands gives this example: hbase get

Re: Scans and lexical sorting

2011-11-16 Thread Doug Meil
I'll update this. Thanks. On 11/16/11 12:17 PM, lars hofhansl lhofha...@yahoo.com wrote: Hi Mark, good find. I think that works by accident and the book is wrong. row + new byte[] {0} will use byte[].toString() and actually result in something like: row[B@152b6651, which (again

Re: Not able to change the VERSION of hbase row

2011-11-16 Thread Doug Meil
Also, there is an example of a versioned get here... http://hbase.apache.org/book.html#get On 11/16/11 12:34 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: You need to tell the Get or Scan to fetch more versions. For example, the help for the get commands gives this example: hbase get

Re: Scans and lexical sorting

2011-11-17 Thread Doug Meil
I'll update. It's great that people are actually reading the docs now! Keep the comments coming. :-) On 11/17/11 1:49 PM, lars hofhansl lhofha...@yahoo.com wrote: That's better I agree. stop key = start key is valid, though (it becomes a get then). Should definitely mention that

Re: understanding Rowcounter

2011-11-21 Thread Doug Meil
You might submit an MR job via thrift, but you don't want to do a rowcount on a large table without MapReduce. On 11/21/11 7:43 AM, Jahangir Mohammed md.jahangi...@gmail.com wrote: Region is the split used by mapper. Thanks, Jahangir Mohammed. On Mon, Nov 21, 2011 at 6:44 AM, Rita

Re: Region Splits

2011-11-21 Thread Doug Meil
Hi there- The last part of 6.3.2.3 is important: Expect tradeoffs when designing rowkeys. Some of this stuff you just have to prototype. In terms of performance... http://hbase.apache.org/book.html#keyvalue .. if you have huge keys, you'll feel it there. For the MR part, see...

Re: How HBase implements delete operations

2011-11-26 Thread Doug Meil
This is a good question. I'm actually not sure. According to the Delete Javadoc... http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Del ete.html To delete an entire row, instantiate a Delete object with the row to delete. To further define the scope of what to delete,

Re: How HBase implements delete operations

2011-11-28 Thread Doug Meil
Thanks Lars, I'll update the docs with this. On 11/27/11 6:31 PM, lars hofhansl lhofha...@yahoo.com wrote: That is correct. From: yonghu yongyong...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Sunday, November 27, 2011

Re: [austin-cug] Organization Meeting - Austin ACM Special Interest Group on Knowledge Discovery and Data Mining

2011-11-28 Thread Doug Meil
Hi there- I'm happy for your new group, but can you guys take the hbase user dist-list off this conversation, please? On 11/28/11 2:27 PM, Craig Dupree craig.m.dup...@gmail.com wrote: David, Please slow down, and let the rest of us have a chance to catch up with you. You've gone from

HBase book updated

2011-11-29 Thread Doug Meil
Hi folks- The Apache HBase book has been updated. It's been a few weeks since it's been updated and there are a variety of small improvements in there. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: About the feasibility of hbase in cloud storage

2011-11-29 Thread Doug Meil
Hi there- I'd like to introduce you to the Hbase book: http://hbase.apache.org/book.html There are also 2 sections specifically about running in EC2. http://hbase.apache.org/book.html#perf.ec2 http://hbase.apache.org/book.html#trouble.ec2 On 11/29/11 9:11 PM, 庄阳

Re: regions and tables

2011-12-01 Thread Doug Meil
To expand on what Lars said, there is an example of how this is layed out on disk... http://hbase.apache.org/book.html#trouble.namenode.disk ... regions distribute the table, so two different tables will be distributed by separate sets of regions. On 12/1/11 3:14 AM, Lars George

Re: Performance characteristics of scans using timestamp as the filter

2011-12-01 Thread Doug Meil
Scans work on startRow/stopRow... http://hbase.apache.org/book.html#scan ... you can also select by timestamp *within the startRow/stopRow selection*, but this isn't intended to quickly select rows by timestamp irrespective of their keys. On 12/1/11 9:03 AM, Srikanth P. Shreenivas

Re: Scan Metrics in Ganglia

2011-12-01 Thread Doug Meil
This can be a bit tricky because of the scan caching, for example... http://hbase.apache.org/book.html#rs_metrics 12.4.2.14. hbase.regionserver.requests Total number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a

Apache HBase Book updated and Home Page updated

2011-12-02 Thread Doug Meil
Hi folks- The home page (http://hbase.apache.orhttp://hbase.apache.orgg) was reorganized and updated. Similar info, just a bit easier to read. Also, the book has been updated as well. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: Implementation of Filters using Thrift and Ruby

2011-12-05 Thread Doug Meil
I believe that these filters... http://hbase.apache.org/book.html#thrift.filter-language ... were added to .92. This should be marked as such in the book. I'll update it. On 12/4/11 11:48 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi, I tried using Filters with Java Api and it works

<    1   2   3   4   >