Re: how client location a region/tablet?

2012-08-23 Thread Doug Meil

For further information about the catalog tables and region-regionserver
assignment, see thisŠ

http://hbase.apache.org/book.html#arch.catalog






On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:

Thank you Stack, especially for the smart 6 round trip guess for the
puzzle. :-)

1. Yeah, we client cache's locations, not the data. -- does it mean for
each client, it will cache all location information of a HBase cluster,
i.e. which physical server owns which region? Supposing each region has
128M bytes, for a big cluster (P-bytes level), total data size / 128M is
not a trivial number, not sure if any overhead to client?
2. A bit confused by what do you mean not the data? For the client
cached
location information, it should be the data in table METADATA, which is
region / physical server mapping data. Why you say not data (do you mean
real content in each region)?

regards,
Lin

On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:

 On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
  Hello guys,
 
  I am referencing the Big Table paper about how a client locates a
tablet.
  In section 5.1 Tablet location, it is mentioned that client will cache
 all
  tablet locations, I think it means client will cache root tablet in
  METADATA table, and all other tablets in METADATA table (which means
 client
  cache the whole METADATA table?). My question is, whether HBase
 implements
  in the same or similar way? My concern or confusion is, supposing each
  tablet or region file is 128M bytes, it will be very huge space (i.e.
  memory footprint) for each client to cache all tablets or region
files of
  METADATA table. Is it doable or feasible in real HBase clusters?
Thanks.
 

 Yeah, we client cache's locations, not the data.


  BTW: another confusion from me is in the paper of Big Table section
5.1
  Tablet location, it is mentioned that If the client¹s cache is stale,
 the
  location algorithm could take up to six round-trips, because stale
cache
  entries are only discovered upon misses (assuming that METADATA
tablets
 do
  not move very frequently)., I do not know how the 6 times round trip
 time
  is calculated, if anyone could answer this puzzle, it will be great.
:-)
 

 I'm not sure what the 6 is about either.  Here is a guesstimate:

 1. Go to cached location for a server for a particular user region,
 but server says that it does not have a region, the client location is
 stale
 2. Go back to client cached meta region that holds user region w/ row
 we want, but its location is stale.
 3. Go to root location, to find new location of meta, but the root
 location has moved what the client has is stale
 4. Find new root location and do lookup of meta region location
 5. Go to meta region location to find new user region
 6. Go to server w/ user region

 St.Ack





Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
Doug, very informative document. Thanks a lot!

I read through it and have some thoughts,

- Supposing at the beginning, client side cache for region information is
empty, and the client wants to GET row-key 123 from table ABC;
- The client will read from ROOT table at first. But unfortunately, ROOT
table only contains region information for META table (please correct me if
I am wrong), but not region information for real data table (e.g. table
ABC);
- Does the client have to call each META region server one by one, in order
to find which META region contains information for region owner of row-key
123 of data table ABC?

BTW: I think if there is a way to expose information about what range of
table/region each META region contains from .META. region key, it will be
better to save time to iterate META region server one by one. Please feel
free to correct me if I am wrong.

regards,
Lin

On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote:


 For further information about the catalog tables and region-regionserver
 assignment, see thisŠ

 http://hbase.apache.org/book.html#arch.catalog






 On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:

 Thank you Stack, especially for the smart 6 round trip guess for the
 puzzle. :-)
 
 1. Yeah, we client cache's locations, not the data. -- does it mean for
 each client, it will cache all location information of a HBase cluster,
 i.e. which physical server owns which region? Supposing each region has
 128M bytes, for a big cluster (P-bytes level), total data size / 128M is
 not a trivial number, not sure if any overhead to client?
 2. A bit confused by what do you mean not the data? For the client
 cached
 location information, it should be the data in table METADATA, which is
 region / physical server mapping data. Why you say not data (do you mean
 real content in each region)?
 
 regards,
 Lin
 
 On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:
 
  On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
   Hello guys,
  
   I am referencing the Big Table paper about how a client locates a
 tablet.
   In section 5.1 Tablet location, it is mentioned that client will cache
  all
   tablet locations, I think it means client will cache root tablet in
   METADATA table, and all other tablets in METADATA table (which means
  client
   cache the whole METADATA table?). My question is, whether HBase
  implements
   in the same or similar way? My concern or confusion is, supposing each
   tablet or region file is 128M bytes, it will be very huge space (i.e.
   memory footprint) for each client to cache all tablets or region
 files of
   METADATA table. Is it doable or feasible in real HBase clusters?
 Thanks.
  
 
  Yeah, we client cache's locations, not the data.
 
 
   BTW: another confusion from me is in the paper of Big Table section
 5.1
   Tablet location, it is mentioned that If the client¹s cache is stale,
  the
   location algorithm could take up to six round-trips, because stale
 cache
   entries are only discovered upon misses (assuming that METADATA
 tablets
  do
   not move very frequently)., I do not know how the 6 times round trip
  time
   is calculated, if anyone could answer this puzzle, it will be great.
 :-)
  
 
  I'm not sure what the 6 is about either.  Here is a guesstimate:
 
  1. Go to cached location for a server for a particular user region,
  but server says that it does not have a region, the client location is
  stale
  2. Go back to client cached meta region that holds user region w/ row
  we want, but its location is stale.
  3. Go to root location, to find new location of meta, but the root
  location has moved what the client has is stale
  4. Find new root location and do lookup of meta region location
  5. Go to meta region location to find new user region
  6. Go to server w/ user region
 
  St.Ack
 





Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
Dong,

Some more thoughts, after reading data structure for HRegionInfo =
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html,
start key and end key looks informative which we could leverage,

- I am not sure if we could leverage this information (stored as part of
value in table ROOT) to find which META region may contains region server
information for row-key 123 of data table ABC;
- But I think unfortunately the information is stored in value of table
ROOT, other than key field of table ROOT, so that we have to iterate each
row in ROOT table one by one to figure out which META region server to
access.

Not sure if I get the points. Please feel free to correct me.

regards,
Lin

On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote:

 Doug, very informative document. Thanks a lot!

 I read through it and have some thoughts,

 - Supposing at the beginning, client side cache for region information is
 empty, and the client wants to GET row-key 123 from table ABC;
 - The client will read from ROOT table at first. But unfortunately, ROOT
 table only contains region information for META table (please correct me if
 I am wrong), but not region information for real data table (e.g. table
 ABC);
 - Does the client have to call each META region server one by one, in
 order to find which META region contains information for region owner of
 row-key 123 of data table ABC?

 BTW: I think if there is a way to expose information about what range of
 table/region each META region contains from .META. region key, it will be
 better to save time to iterate META region server one by one. Please feel
 free to correct me if I am wrong.

 regards,
 Lin


 On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil 
 doug.m...@explorysmedical.comwrote:


 For further information about the catalog tables and region-regionserver
 assignment, see thisŠ

 http://hbase.apache.org/book.html#arch.catalog






 On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:

 Thank you Stack, especially for the smart 6 round trip guess for the
 puzzle. :-)
 
 1. Yeah, we client cache's locations, not the data. -- does it mean for
 each client, it will cache all location information of a HBase cluster,
 i.e. which physical server owns which region? Supposing each region has
 128M bytes, for a big cluster (P-bytes level), total data size / 128M is
 not a trivial number, not sure if any overhead to client?
 2. A bit confused by what do you mean not the data? For the client
 cached
 location information, it should be the data in table METADATA, which is
 region / physical server mapping data. Why you say not data (do you mean
 real content in each region)?
 
 regards,
 Lin
 
 On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:
 
  On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
   Hello guys,
  
   I am referencing the Big Table paper about how a client locates a
 tablet.
   In section 5.1 Tablet location, it is mentioned that client will
 cache
  all
   tablet locations, I think it means client will cache root tablet in
   METADATA table, and all other tablets in METADATA table (which means
  client
   cache the whole METADATA table?). My question is, whether HBase
  implements
   in the same or similar way? My concern or confusion is, supposing
 each
   tablet or region file is 128M bytes, it will be very huge space (i.e.
   memory footprint) for each client to cache all tablets or region
 files of
   METADATA table. Is it doable or feasible in real HBase clusters?
 Thanks.
  
 
  Yeah, we client cache's locations, not the data.
 
 
   BTW: another confusion from me is in the paper of Big Table section
 5.1
   Tablet location, it is mentioned that If the client¹s cache is
 stale,
  the
   location algorithm could take up to six round-trips, because stale
 cache
   entries are only discovered upon misses (assuming that METADATA
 tablets
  do
   not move very frequently)., I do not know how the 6 times round trip
  time
   is calculated, if anyone could answer this puzzle, it will be great.
 :-)
  
 
  I'm not sure what the 6 is about either.  Here is a guesstimate:
 
  1. Go to cached location for a server for a particular user region,
  but server says that it does not have a region, the client location is
  stale
  2. Go back to client cached meta region that holds user region w/ row
  we want, but its location is stale.
  3. Go to root location, to find new location of meta, but the root
  location has moved what the client has is stale
  4. Find new root location and do lookup of meta region location
  5. Go to meta region location to find new user region
  6. Go to server w/ user region
 
  St.Ack
 






Re: how client location a region/tablet?

2012-08-23 Thread Harsh J
HBase currently keeps a single META region (Doesn't split it). ROOT
holds META region location, and META has a few rows in it, a few of
them for each table. See also the class MetaScanner.

On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote:
 Dong,

 Some more thoughts, after reading data structure for HRegionInfo =
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html,
 start key and end key looks informative which we could leverage,

 - I am not sure if we could leverage this information (stored as part of
 value in table ROOT) to find which META region may contains region server
 information for row-key 123 of data table ABC;
 - But I think unfortunately the information is stored in value of table
 ROOT, other than key field of table ROOT, so that we have to iterate each
 row in ROOT table one by one to figure out which META region server to
 access.

 Not sure if I get the points. Please feel free to correct me.

 regards,
 Lin

 On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote:

 Doug, very informative document. Thanks a lot!

 I read through it and have some thoughts,

 - Supposing at the beginning, client side cache for region information is
 empty, and the client wants to GET row-key 123 from table ABC;
 - The client will read from ROOT table at first. But unfortunately, ROOT
 table only contains region information for META table (please correct me if
 I am wrong), but not region information for real data table (e.g. table
 ABC);
 - Does the client have to call each META region server one by one, in
 order to find which META region contains information for region owner of
 row-key 123 of data table ABC?

 BTW: I think if there is a way to expose information about what range of
 table/region each META region contains from .META. region key, it will be
 better to save time to iterate META region server one by one. Please feel
 free to correct me if I am wrong.

 regards,
 Lin


 On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil 
 doug.m...@explorysmedical.comwrote:


 For further information about the catalog tables and region-regionserver
 assignment, see thisŠ

 http://hbase.apache.org/book.html#arch.catalog






 On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:

 Thank you Stack, especially for the smart 6 round trip guess for the
 puzzle. :-)
 
 1. Yeah, we client cache's locations, not the data. -- does it mean for
 each client, it will cache all location information of a HBase cluster,
 i.e. which physical server owns which region? Supposing each region has
 128M bytes, for a big cluster (P-bytes level), total data size / 128M is
 not a trivial number, not sure if any overhead to client?
 2. A bit confused by what do you mean not the data? For the client
 cached
 location information, it should be the data in table METADATA, which is
 region / physical server mapping data. Why you say not data (do you mean
 real content in each region)?
 
 regards,
 Lin
 
 On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:
 
  On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
   Hello guys,
  
   I am referencing the Big Table paper about how a client locates a
 tablet.
   In section 5.1 Tablet location, it is mentioned that client will
 cache
  all
   tablet locations, I think it means client will cache root tablet in
   METADATA table, and all other tablets in METADATA table (which means
  client
   cache the whole METADATA table?). My question is, whether HBase
  implements
   in the same or similar way? My concern or confusion is, supposing
 each
   tablet or region file is 128M bytes, it will be very huge space (i.e.
   memory footprint) for each client to cache all tablets or region
 files of
   METADATA table. Is it doable or feasible in real HBase clusters?
 Thanks.
  
 
  Yeah, we client cache's locations, not the data.
 
 
   BTW: another confusion from me is in the paper of Big Table section
 5.1
   Tablet location, it is mentioned that If the client¹s cache is
 stale,
  the
   location algorithm could take up to six round-trips, because stale
 cache
   entries are only discovered upon misses (assuming that METADATA
 tablets
  do
   not move very frequently)., I do not know how the 6 times round trip
  time
   is calculated, if anyone could answer this puzzle, it will be great.
 :-)
  
 
  I'm not sure what the 6 is about either.  Here is a guesstimate:
 
  1. Go to cached location for a server for a particular user region,
  but server says that it does not have a region, the client location is
  stale
  2. Go back to client cached meta region that holds user region w/ row
  we want, but its location is stale.
  3. Go to root location, to find new location of meta, but the root
  location has moved what the client has is stale
  4. Find new root location and do lookup of meta region location
  5. Go to meta region location to find new user region
  6. Go to server w/ user region
 
  St.Ack
 







-- 
Harsh J


Re: how client location a region/tablet?

2012-08-23 Thread Harsh J
Lin,

On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma lin...@gmail.com wrote:
 Thanks, Harsh!

 - HBase currently keeps a single META region (Doesn't split it).  -- does
 it mean there is only one row in ROOT table, which points the only one META
 region?

Yes, currently this is the case. We disabled multiple META regions at
some point, I am unsure about why exactly but perhaps it was complex
to maintain that.

 - In Big Table, it seems they have multiple META regions (tablets), is it an
 advantage over HBase? :-)

Well, depends. A single META region hasn't proven as a scalability
bottleneck to anyone yet. A single META region can easily serve
millions of rows if needed, like any other region, and I've usually
not seen META table grow so big in deployments.

-- 
Harsh J


RE: how client location a region/tablet?

2012-08-23 Thread Pamecha, Abhishek
I too thought there are multiple meta regions where as just one ROOT.  May be I 
am mixing b/w Big Table and Hbase.

Thanks,
Abhishek


-Original Message-
From: Lin Ma [mailto:lin...@gmail.com] 
Sent: Thursday, August 23, 2012 9:41 AM
To: user@hbase.apache.org; ha...@cloudera.com
Cc: doug.m...@explorysmedical.com
Subject: Re: how client location a region/tablet?

Thanks, Harsh!

- HBase currently keeps a single META region (Doesn't split it).  -- does it 
mean there is only one row in ROOT table, which points the only one META region?
- In Big Table, it seems they have multiple META regions (tablets), is it an 
advantage over HBase? :-)

regards,
Lin
On Thu, Aug 23, 2012 at 11:48 PM, Harsh J ha...@cloudera.com wrote:

 HBase currently keeps a single META region (Doesn't split it). ROOT 
 holds META region location, and META has a few rows in it, a few of 
 them for each table. See also the class MetaScanner.

 On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote:
  Dong,
 
  Some more thoughts, after reading data structure for HRegionInfo = 
  http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.
  html
 ,
  start key and end key looks informative which we could leverage,
 
  - I am not sure if we could leverage this information (stored as 
  part of value in table ROOT) to find which META region may contains 
  region server information for row-key 123 of data table ABC;
  - But I think unfortunately the information is stored in value of 
  table ROOT, other than key field of table ROOT, so that we have to 
  iterate each row in ROOT table one by one to figure out which META 
  region server to access.
 
  Not sure if I get the points. Please feel free to correct me.
 
  regards,
  Lin
 
  On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote:
 
  Doug, very informative document. Thanks a lot!
 
  I read through it and have some thoughts,
 
  - Supposing at the beginning, client side cache for region 
  information
 is
  empty, and the client wants to GET row-key 123 from table ABC;
  - The client will read from ROOT table at first. But unfortunately, 
  ROOT table only contains region information for META table (please 
  correct
 me if
  I am wrong), but not region information for real data table (e.g. 
  table ABC);
  - Does the client have to call each META region server one by one, 
  in order to find which META region contains information for region 
  owner of row-key 123 of data table ABC?
 
  BTW: I think if there is a way to expose information about what 
  range of table/region each META region contains from .META. region 
  key, it will
 be
  better to save time to iterate META region server one by one. 
  Please
 feel
  free to correct me if I am wrong.
 
  regards,
  Lin
 
 
  On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil 
 doug.m...@explorysmedical.comwrote:
 
 
  For further information about the catalog tables and
 region-regionserver
  assignment, see thisŠ
 
  http://hbase.apache.org/book.html#arch.catalog
 
 
 
 
 
 
  On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:
 
  Thank you Stack, especially for the smart 6 round trip guess for 
  the puzzle. :-)
  
  1. Yeah, we client cache's locations, not the data. -- does it 
  mean
 for
  each client, it will cache all location information of a HBase
 cluster,
  i.e. which physical server owns which region? Supposing each 
  region
 has
  128M bytes, for a big cluster (P-bytes level), total data size / 
  128M
 is
  not a trivial number, not sure if any overhead to client?
  2. A bit confused by what do you mean not the data? For the 
  client cached location information, it should be the data in 
  table METADATA, which
 is
  region / physical server mapping data. Why you say not data (do 
  you
 mean
  real content in each region)?
  
  regards,
  Lin
  
  On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:
  
   On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
Hello guys,
   
I am referencing the Big Table paper about how a client 
locates a
  tablet.
In section 5.1 Tablet location, it is mentioned that client 
will
  cache
   all
tablet locations, I think it means client will cache root 
tablet
 in
METADATA table, and all other tablets in METADATA table 
(which
 means
   client
cache the whole METADATA table?). My question is, whether 
HBase
   implements
in the same or similar way? My concern or confusion is, 
supposing
  each
tablet or region file is 128M bytes, it will be very huge 
space
 (i.e.
memory footprint) for each client to cache all tablets or 
region
  files of
METADATA table. Is it doable or feasible in real HBase clusters?
  Thanks.
   
  
   Yeah, we client cache's locations, not the data.
  
  
BTW: another confusion from me is in the paper of Big Table
 section
  5.1
Tablet location, it is mentioned that If the client¹s cache 
is
  stale,
   the
location

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
Thank you Harsh. You answered my question. I like the current architecture
of HBase, which is designed for extensibility for the future -- we have two
layer index of data structure, and we can utilize it when needed for
specific problems. It looks like you buy a 4 bed-room house, but only
utilizing one room for living before having more children. :-)

regards,
Lin

On Fri, Aug 24, 2012 at 12:46 AM, Harsh J ha...@cloudera.com wrote:

 Lin,

 On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma lin...@gmail.com wrote:
  Thanks, Harsh!
 
  - HBase currently keeps a single META region (Doesn't split it).  --
 does
  it mean there is only one row in ROOT table, which points the only one
 META
  region?

 Yes, currently this is the case. We disabled multiple META regions at
 some point, I am unsure about why exactly but perhaps it was complex
 to maintain that.

  - In Big Table, it seems they have multiple META regions (tablets), is
 it an
  advantage over HBase? :-)

 Well, depends. A single META region hasn't proven as a scalability
 bottleneck to anyone yet. A single META region can easily serve
 millions of rows if needed, like any other region, and I've usually
 not seen META table grow so big in deployments.

 --
 Harsh J



Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
Me too, Abhishek -- you are not alone. But it is good to learn and discuss
here to know various design choices.

regards,
Lin

On Fri, Aug 24, 2012 at 1:06 AM, Pamecha, Abhishek apame...@x.com wrote:

 I too thought there are multiple meta regions where as just one ROOT.  May
 be I am mixing b/w Big Table and Hbase.

 Thanks,
 Abhishek


 -Original Message-
 From: Lin Ma [mailto:lin...@gmail.com]
 Sent: Thursday, August 23, 2012 9:41 AM
 To: user@hbase.apache.org; ha...@cloudera.com
 Cc: doug.m...@explorysmedical.com
 Subject: Re: how client location a region/tablet?

 Thanks, Harsh!

 - HBase currently keeps a single META region (Doesn't split it).  --
 does it mean there is only one row in ROOT table, which points the only one
 META region?
 - In Big Table, it seems they have multiple META regions (tablets), is it
 an advantage over HBase? :-)

 regards,
 Lin
 On Thu, Aug 23, 2012 at 11:48 PM, Harsh J ha...@cloudera.com wrote:

  HBase currently keeps a single META region (Doesn't split it). ROOT
  holds META region location, and META has a few rows in it, a few of
  them for each table. See also the class MetaScanner.
 
  On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote:
   Dong,
  
   Some more thoughts, after reading data structure for HRegionInfo =
   http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.
   html
  ,
   start key and end key looks informative which we could leverage,
  
   - I am not sure if we could leverage this information (stored as
   part of value in table ROOT) to find which META region may contains
   region server information for row-key 123 of data table ABC;
   - But I think unfortunately the information is stored in value of
   table ROOT, other than key field of table ROOT, so that we have to
   iterate each row in ROOT table one by one to figure out which META
   region server to access.
  
   Not sure if I get the points. Please feel free to correct me.
  
   regards,
   Lin
  
   On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote:
  
   Doug, very informative document. Thanks a lot!
  
   I read through it and have some thoughts,
  
   - Supposing at the beginning, client side cache for region
   information
  is
   empty, and the client wants to GET row-key 123 from table ABC;
   - The client will read from ROOT table at first. But unfortunately,
   ROOT table only contains region information for META table (please
   correct
  me if
   I am wrong), but not region information for real data table (e.g.
   table ABC);
   - Does the client have to call each META region server one by one,
   in order to find which META region contains information for region
   owner of row-key 123 of data table ABC?
  
   BTW: I think if there is a way to expose information about what
   range of table/region each META region contains from .META. region
   key, it will
  be
   better to save time to iterate META region server one by one.
   Please
  feel
   free to correct me if I am wrong.
  
   regards,
   Lin
  
  
   On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil 
  doug.m...@explorysmedical.comwrote:
  
  
   For further information about the catalog tables and
  region-regionserver
   assignment, see thisŠ
  
   http://hbase.apache.org/book.html#arch.catalog
  
  
  
  
  
  
   On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote:
  
   Thank you Stack, especially for the smart 6 round trip guess for
   the puzzle. :-)
   
   1. Yeah, we client cache's locations, not the data. -- does it
   mean
  for
   each client, it will cache all location information of a HBase
  cluster,
   i.e. which physical server owns which region? Supposing each
   region
  has
   128M bytes, for a big cluster (P-bytes level), total data size /
   128M
  is
   not a trivial number, not sure if any overhead to client?
   2. A bit confused by what do you mean not the data? For the
   client cached location information, it should be the data in
   table METADATA, which
  is
   region / physical server mapping data. Why you say not data (do
   you
  mean
   real content in each region)?
   
   regards,
   Lin
   
   On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:
   
On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
 Hello guys,

 I am referencing the Big Table paper about how a client
 locates a
   tablet.
 In section 5.1 Tablet location, it is mentioned that client
 will
   cache
all
 tablet locations, I think it means client will cache root
 tablet
  in
 METADATA table, and all other tablets in METADATA table
 (which
  means
client
 cache the whole METADATA table?). My question is, whether
 HBase
implements
 in the same or similar way? My concern or confusion is,
 supposing
   each
 tablet or region file is 128M bytes, it will be very huge
 space
  (i.e.
 memory footprint) for each client to cache all tablets or
 region
   files of
 METADATA

Re: how client location a region/tablet?

2012-08-19 Thread Lars George
That is spot on Stack, it is the worst case scenario as you describe, i.e. all 
cached information is stale.

Lars

On Aug 19, 2012, at 6:40 AM, Stack st...@duboce.net wrote:

 On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
 Hello guys,
 
 I am referencing the Big Table paper about how a client locates a tablet.
 In section 5.1 Tablet location, it is mentioned that client will cache all
 tablet locations, I think it means client will cache root tablet in
 METADATA table, and all other tablets in METADATA table (which means client
 cache the whole METADATA table?). My question is, whether HBase implements
 in the same or similar way? My concern or confusion is, supposing each
 tablet or region file is 128M bytes, it will be very huge space (i.e.
 memory footprint) for each client to cache all tablets or region files of
 METADATA table. Is it doable or feasible in real HBase clusters? Thanks.
 
 
 Yeah, we client cache's locations, not the data.
 
 
 BTW: another confusion from me is in the paper of Big Table section 5.1
 Tablet location, it is mentioned that If the client’s cache is stale, the
 location algorithm could take up to six round-trips, because stale cache
 entries are only discovered upon misses (assuming that METADATA tablets do
 not move very frequently)., I do not know how the 6 times round trip time
 is calculated, if anyone could answer this puzzle, it will be great. :-)
 
 
 I'm not sure what the 6 is about either.  Here is a guesstimate:
 
 1. Go to cached location for a server for a particular user region,
 but server says that it does not have a region, the client location is
 stale
 2. Go back to client cached meta region that holds user region w/ row
 we want, but its location is stale.
 3. Go to root location, to find new location of meta, but the root
 location has moved what the client has is stale
 4. Find new root location and do lookup of meta region location
 5. Go to meta region location to find new user region
 6. Go to server w/ user region
 
 St.Ack



Re: how client location a region/tablet?

2012-08-19 Thread Lin Ma
Thank you Stack, especially for the smart 6 round trip guess for the
puzzle. :-)

1. Yeah, we client cache's locations, not the data. -- does it mean for
each client, it will cache all location information of a HBase cluster,
i.e. which physical server owns which region? Supposing each region has
128M bytes, for a big cluster (P-bytes level), total data size / 128M is
not a trivial number, not sure if any overhead to client?
2. A bit confused by what do you mean not the data? For the client cached
location information, it should be the data in table METADATA, which is
region / physical server mapping data. Why you say not data (do you mean
real content in each region)?

regards,
Lin

On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote:

 On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
  Hello guys,
 
  I am referencing the Big Table paper about how a client locates a tablet.
  In section 5.1 Tablet location, it is mentioned that client will cache
 all
  tablet locations, I think it means client will cache root tablet in
  METADATA table, and all other tablets in METADATA table (which means
 client
  cache the whole METADATA table?). My question is, whether HBase
 implements
  in the same or similar way? My concern or confusion is, supposing each
  tablet or region file is 128M bytes, it will be very huge space (i.e.
  memory footprint) for each client to cache all tablets or region files of
  METADATA table. Is it doable or feasible in real HBase clusters? Thanks.
 

 Yeah, we client cache's locations, not the data.


  BTW: another confusion from me is in the paper of Big Table section 5.1
  Tablet location, it is mentioned that If the client’s cache is stale,
 the
  location algorithm could take up to six round-trips, because stale cache
  entries are only discovered upon misses (assuming that METADATA tablets
 do
  not move very frequently)., I do not know how the 6 times round trip
 time
  is calculated, if anyone could answer this puzzle, it will be great. :-)
 

 I'm not sure what the 6 is about either.  Here is a guesstimate:

 1. Go to cached location for a server for a particular user region,
 but server says that it does not have a region, the client location is
 stale
 2. Go back to client cached meta region that holds user region w/ row
 we want, but its location is stale.
 3. Go to root location, to find new location of meta, but the root
 location has moved what the client has is stale
 4. Find new root location and do lookup of meta region location
 5. Go to meta region location to find new user region
 6. Go to server w/ user region

 St.Ack



how client location a region/tablet?

2012-08-18 Thread Lin Ma
Hello guys,

I am referencing the Big Table paper about how a client locates a tablet.
In section 5.1 Tablet location, it is mentioned that client will cache all
tablet locations, I think it means client will cache root tablet in
METADATA table, and all other tablets in METADATA table (which means client
cache the whole METADATA table?). My question is, whether HBase implements
in the same or similar way? My concern or confusion is, supposing each
tablet or region file is 128M bytes, it will be very huge space (i.e.
memory footprint) for each client to cache all tablets or region files of
METADATA table. Is it doable or feasible in real HBase clusters? Thanks.

BTW: another confusion from me is in the paper of Big Table section 5.1
Tablet location, it is mentioned that If the client’s cache is stale, the
location algorithm could take up to six round-trips, because stale cache
entries are only discovered upon misses (assuming that METADATA tablets do
not move very frequently)., I do not know how the 6 times round trip time
is calculated, if anyone could answer this puzzle, it will be great. :-)

have a good weekend,
Lin


Re: how client location a region/tablet?

2012-08-18 Thread Stack
On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote:
 Hello guys,

 I am referencing the Big Table paper about how a client locates a tablet.
 In section 5.1 Tablet location, it is mentioned that client will cache all
 tablet locations, I think it means client will cache root tablet in
 METADATA table, and all other tablets in METADATA table (which means client
 cache the whole METADATA table?). My question is, whether HBase implements
 in the same or similar way? My concern or confusion is, supposing each
 tablet or region file is 128M bytes, it will be very huge space (i.e.
 memory footprint) for each client to cache all tablets or region files of
 METADATA table. Is it doable or feasible in real HBase clusters? Thanks.


Yeah, we client cache's locations, not the data.


 BTW: another confusion from me is in the paper of Big Table section 5.1
 Tablet location, it is mentioned that If the client’s cache is stale, the
 location algorithm could take up to six round-trips, because stale cache
 entries are only discovered upon misses (assuming that METADATA tablets do
 not move very frequently)., I do not know how the 6 times round trip time
 is calculated, if anyone could answer this puzzle, it will be great. :-)


I'm not sure what the 6 is about either.  Here is a guesstimate:

1. Go to cached location for a server for a particular user region,
but server says that it does not have a region, the client location is
stale
2. Go back to client cached meta region that holds user region w/ row
we want, but its location is stale.
3. Go to root location, to find new location of meta, but the root
location has moved what the client has is stale
4. Find new root location and do lookup of meta region location
5. Go to meta region location to find new user region
6. Go to server w/ user region

St.Ack