Hey St.Ack:

Thank you for your reply.

I chose to start with HBase after getting the answer for the original post
on the hadoop list :)

As of now I use two fields to form a composite key. Other fields are
organized into one column family.

I will discuss with my manager and see how she thinks about getting more
nodes to continue the testing.

Thanks!
Xueling

On Thu, Dec 17, 2009 at 4:40 PM, stack <[email protected]> wrote:

> Hey Xueling:
>
> Now I notice that you are the fellow who recently wrote up on the hadoop
> list.
>
> Todds described scheme I take it won't work for you then? There'd be less
> moving parts for sure.
>
> Up on hadoop list you gave a description of your records as so:
>
> "1-1-174-418 TGTGTCCCTTTGTAATGAATCACTATC U2 0 0 1 4 *103570835* F .. 23G 24
>
> "The highlighted field is called "position of match" and the query we are
> interested in is the # of sequences in a certain range of this "position of
> match". For instance the range can be "position of match" > 200 and
> "position of match" + 36 < 200,000."
>
> What are you thinking regards row key?  Will each of the fields above be
> concatenated as row key or will they each be individual columns all in the
> one column family or in many?
>
> I'd suggest you get some subset of your dataset, say a million records or
> so.  This should load into a single hbase node fine.  Use this small
> dataset
> to figure the schema that best serves the way you'll be querying the data.
>
> If you can get away with a single family, work on writing an import that
> write hfiles directly:
>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
> .
>  It'll run an order of magnitude or more faster than going via the API.
>
> Now, as to the size of the cluster, see the presentations section where
> Ryan
> describes the hardware used loading up a 9B row table.  His hardware might
> be more than you need.  I'd suggest you start with 4 or 5 nodes and see how
> loading goes.  Check query latency.  If the numbers are not to your liking,
> add more nodes.  HBase generally scales linearly.
>
> Hope this helps,
> St.Ack
>
>
>
>
>
>
>
>
> On Thu, Dec 17, 2009 at 4:00 PM, Xueling Shu <[email protected]
> >wrote:
>
> > Hi St.Ack:
> >
> > Wondering how many nodes in a cluster do you recommend to hold 5B data?
> > Eventually we need to handle X times 5B data. I want to get an idea of
> how
> > many resources we need.
> >
> > Thanks,
> > Xueling
> >
> >
> > On Thu, Dec 17, 2009 at 3:45 PM, stack <[email protected]> wrote:
> >
> > > Hey Xueling, 5B into a single node ain't going to work.  Get yourself a
> > bit
> > > of a cluster somewhere.  Single node is for messing around.  Not for
> > doing
> > > 'real' stuff.
> > >
> > > St.Ack
> > >
> > >
> > > On Thu, Dec 17, 2009 at 3:29 PM, stack <[email protected]> wrote:
> > >
> > > > On Thu, Dec 17, 2009 at 2:38 PM, Xueling Shu <
> [email protected]
> > > >wrote:
> > > >
> > > >>
> > > >> Things started fine until 5 mins after the data population started.
> > > >>
> > > >> Here is the exception:
> > > >> Exception in thread "main"
> > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > > >> contact
> > > >> region server 10.0.176.64:39045 for region Genome,,1261087437258,
> row
> > > >>
> > > >>
> > >
> >
> '\x00\x00\x00\x00\x0E\xB00\xAC\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00s\xAD',
> > > >> but failed after 10 attempts.
> > > >> Exceptions:
> > > >> java.io.IOException: java.io.IOException: Server not running,
> aborting
> > > >>
> > > >
> > > > See why it quit by looking in the regionserver log.
> > > >
> > > > Make sure you have latest hbase and read the 'Getting Started'
> section.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2347)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1826)
> > > >>        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
> Source)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
> > > >>        at
> > > >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > > >>        at
> > > >>
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > > >>
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >> java.net.ConnectException: Connection refused
> > > >>
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.doCall(HConnectionManager.java:1193)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1115)
> > > >>        at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201)
> > > >>        at
> > > >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605)
> > > >>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:470)
> > > >>        at HadoopTrigger.populateData(HadoopTrigger.java:126)
> > > >>        at HadoopTrigger.main(HadoopTrigger.java:52)
> > > >>
> > > >> Can anybody let me know how to fix it?
> > > >> Thanks,
> > > >> Xueling
> > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to