[ 
https://issues.apache.org/jira/browse/PHOENIX-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842749#comment-17842749
 ] 

Viraj Jasani commented on PHOENIX-7309:
---------------------------------------

{quote}HBase provides API to create a table with split points text file.
{quote}
HBase only provides this as shell capability, there is no public Admin API that 
allows clients to provide split points file as input while creating the table.

This is what shell does when SPLITS_FILE is provided:
{code:java}
if arg.key?(SPLITS_FILE)
  splits_file = arg.delete(SPLITS_FILE)
  unless File.exist?(splits_file)
    raise(ArgumentError, "Splits file #{splits_file} doesn't exist")
  end
  arg[SPLITS] = []
  File.foreach(splits_file) do |line|
    arg[SPLITS].push(line.chomp)
  end
  tdb.setValue(SPLITS_FILE, splits_file)
end

if arg.key?(SPLITS)
  splits = Java::byte[][arg[SPLITS].size].new
  idx = 0
  arg.delete(SPLITS).each do |split|
    splits[idx] = org.apache.hadoop.hbase.util.Bytes.toBytesBinary(split)
    idx += 1
  end
elsif arg.key?(NUMREGIONS) || arg.key?(SPLITALGO)
...
...
...
if splits.nil?
  # Perform the create table call
  @admin.createTable(tdb.build)
else
  # Perform the create table call
  @admin.createTable(tdb.build, splits)
end{code}
 

Here, shell creates an array of byte[] split keys and use public admin API:
{code:java}
/**
 * Creates a new table with an initial set of empty regions defined by the 
specified split keys.
 * The total number of regions created will be the number of split keys plus 
one. Synchronous
 * operation. Note : Avoid passing empty split key.
 * @param desc      table descriptor for table
 * @param splitKeys array of split keys for the initial regions of the table
 * @throws IllegalArgumentException                          if the table name 
is reserved, if the
 *                                                           split keys are 
repeated and if the
 *                                                           split key has 
empty byte array.
 * @throws org.apache.hadoop.hbase.MasterNotRunningException if master is not 
running
 * @throws TableExistsException                              if table already 
exists (If
 *                                                           concurrent 
threads, the table may
 *                                                           have been created 
between
 *                                                           test-for-existence 
and
 *                                                           
attempt-at-creation).
 * @throws IOException                                       if a remote or 
network exception
 *                                                           occurs
 */
default void createTable(TableDescriptor desc, byte[][] splitKeys) throws 
IOException {
  get(createTableAsync(desc, splitKeys), getSyncWaitTimeout(), 
TimeUnit.MILLISECONDS);
} {code}
While Phoenix can do something similar to what HBase shell does, I believe 
rather than Phoenix having to read the whole split file (with potentially 10k 
or 50k worth of split keys) and create split keys array, it would be great if 
HBase can provide public Admin API with input as split file. This has 
advantages of building any file content specific validations in HBase Admin 
directly, and HBase Admin can take care of reading the file contents and 
creating split key array, without worrying about file size.

> Support specifying splits.txt file while creating a table.
> ----------------------------------------------------------
>
>                 Key: PHOENIX-7309
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7309
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Rushabh Shah
>            Priority: Major
>
> Currently phoenix grammar support specifying splits points while creating a 
> table.
> See grammar [here|https://phoenix.apache.org/language/index.html#create_table]
> {noformat}
> CREATE TABLE IF NOT EXISTS "my_case_sensitive_table"
>     ( "id" char(10) not null primary key, "value" integer)
>     DATA_BLOCK_ENCODING='NONE',VERSIONS=5,MAX_FILESIZE=2000000 split on (?, 
> ?, ?)
> {noformat}
> This works fine if you have few split points (less than 10-20). 
> But if you want to specify 1000 (or in 10,000s) split points then this API 
> becomes very cumbersome to use.
> HBase provides API to create a table with split points text file.
> {noformat}
>   hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
> {noformat}
> We should also have support in Phoenix to provide split points in a text file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to