Sampling" by RaghothamMurthy

Apache Wiki Wed, 21 Jan 2009 15:24:41 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by RaghothamMurthy:
http://wiki.apache.org/hadoop/Hive/LanguageManual/LanguageManual/Sampling

------------------------------------------------------------------------------
- The TABLESAMPLE clause allows the users to write queries for samples of the 
data instead of the whole table. The TABLESAMPLE clause can be added to any 
table in the FROM clause.
  
  Syntax:
  {{{
  table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])
  }}}
  
- The buckets are numbered starting from 0. '''colname''' indicates the column 
on which to sample each row in the table. colname can be one of the 
non-partition columns in the table or '''rand()''' indicating sampling on the 
entire row instead of an individual column. The rows of the table are 
'bucketed' on the colname randomly into y buckets numbered 0 through y. Rows 
which belong to bucket x are returned.  
+ The TABLESAMPLE clause allows the users to write queries for samples of the 
data instead of the whole table. The TABLESAMPLE clause can be added to any 
table in the FROM clause. The buckets are numbered starting from 0. 
'''colname''' indicates the column on which to sample each row in the table. 
colname can be one of the non-partition columns in the table or '''rand()''' 
indicating sampling on the entire row instead of an individual column. The rows 
of the table are 'bucketed' on the colname randomly into y buckets numbered 0 
through y. Rows which belong to bucket x are returned.  
  
  In the following example the 3rd bucket out of the 32 buckets of the table 
source. 's' is the table alias.
  {{{

[Hadoop Wiki] Trivial Update of "Hive/LanguageManual/LanguageManual/Sampling" by RaghothamMurthy

Reply via email to